LOCAL ASSESSMENT OF PERTURBATIONS
*l*.'*»t*
By
GLEN LAWSON HARTLESS
 .
A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL
OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT
OF THE REQUIREMENTS FOR THE DEGREE OF
DOCTOR OF PHILOSOPHY
UNIVERSITY OF FLORIDA
2000
..'V
^ *".• . ■ .
Copyright 2000
by
Glen Lawson Hartless
To Christy
A
ACKNOWLEDGMENTS
I would first like to thank my chairman, Jim Booth, and cochairman, Ramon
Littell, for all of their help and guidance. Their combined knowledge, insight, and
mentoring has been extraordinary. I would also like to express my gratitude to them
for allowing me to pursue avenues of research that are probably closer to my heart
than theirs. Thanks also go to the other esteemed professors who served on my
supervisory committee: James Algina, Randy Carter, and James Hobert.
I would also like to acknowledge people who have influenced me academically
and professionally over the years: Chuck Nowalk, John Grant, and many others
in the Phillipsburg and Lopatcong school systems, Walter Pirie and the faculty
at Virginia Tech, Ron DiCola at AT&T/Lucent, Larry Leemis at the College of
William & Mary, Danny Perkins, Bo Beaulieu, and many others at the University of
Florida, and also fellow students Jonathan Hartzel, Phil McGoff, and Galin Jones,
among many others. I would also like to thank Allan West for maintaining an
extremely reliable computing environment.
Thanks also go to my parents for their love and support and for instilling in
me the discipline needed to achieve my goals.
Finally, my utmost thanks go to my wife, Christine. I am forever grateful for
her unequivocal love, unending patience, and thoughtful insight.
IV
TABLE OF CONTENTS
page
ACKNOWLEDGMENTS iv
ABSTRACT vii
CHAPTERS , , , ^ %■
1 INTRODUCTION TO LOCAL INFLUENCE 1
1.1 Premise 1
1.2 Measuring the Effects of Perturbations 1
1.3 Local Influence Diagnostics 3
1.4 Perturbations in Regression 8
1.5 Basic Perturbations 18
1.6 Brief Literature Review 18
1.7 Outline 20
2 COMPUTATIONAL ISSUES IN LOCAL INFLUENCE ANALYSIS 21
2.1 Curvatures and the Direction of Maximum Curvature .... 21
2.2 Local Influence Analysis of a Profile Likelihood Displacement 23
2.3 Building Blocks of Local Influence for Regression 24
2.4 Perturbations to the Response in Regression 27
2.5 Unequal Measurement Precision Applied to Regression ... 31
2.6 Benchmarks for Perturbation Analyses 36
2.7 Local Assessment Under Reparameterizations 40
3 PERTURBATIONS TO THE X MATRIX IN REGRESSION ... 49
3.1 Perturbations to Explanatory Variables 49
3.2 Perturbations to a Single Column 50
3.3 Perturbations to a Single Row 59
3.4 Perturbations to the Entire Design Matrix 65
4 LOCAL ASSESSMENT FOR OPTIMAL ESTIMATING FUNCTIONS 87
4.1 Inferential Platform 88
4.2 Local Assessment of Perturbations 93
4.3 Acceleration, Curvature and Normal Curvature 96
4.4 Algebraic Expressions for Local Assessment 98
4.5 Computational Formulas for J and P 103
4.6 Local Assessment for Maximum Likelihood Inference 110
4.7 Firstorder Assessment of the Log Generalized Variance ... 116
4.8 Local Assessment for Quasilikelihood Inference 125
5 LOCAL ASSESSMENT OF PREDICTIONS 134
5.1 Prediction via First Principles 134
5.2 Predictive Likelihood 136
5.3 Influence Measures for Prediction 138
5.4 Estimation and Prediction in Linear Mixed Models 140
5.5 Local Influence Methods for Linear Mixed Models 146
5.6 Oneway Random Eflfects Model 151
5.7 Residual Variance Perturbation Scheme 154
5.8 Cluster Variance Perturbation Schemes 159
5.9 Application to Corn Crop Data 167
6 MULTISTAGE ESTIMATIONS AND FUTURE RESEARCH ... 185
6.1 Local Assessment in Multistage Estimations 185
6.2 Synopsis and Additional Future Research 186
APPENDICES
A MODEL FITTING INFORMATION FOR SCHOOL DATA .... 188
B MODEL FITTING INFORMATION FOR CORN CROP DATA . . 191
REFERENCES 194
BIOGRAPHICAL SKETCH 202
VI
fty?
Abstract of Dissertation Presented to the Graduate School
of the University of Florida in Partial Fulfillment of the
Requirements for the Degree of Doctor of Philosophy
LOCAL ASSESSMENT OF PERTURBATIONS
By
Glen Lawson Hartless
August 2000
Chairman: James G. Booth
Major Department: Statistics
Statistical models are useful and convenient tools for describing data.
However, models can be adversely effected by mistaken information, poor
assumptions, and small subsets of influential data. Seemingly inconsequential
changes to the observed data, the postulated model, or the inference technique may
greatly affect the scientific conclusions made from modelfitting. This dissertation
considers local assessment of perturbations in parametric and semiparametric
models. The effects of perturbation are assessed using a Taylorseries approximation
in the vicinity of the fitted model, leading to graphical diagnostics that allow the
analyst to identify sensitive parameter estimates and sources of undue influence.
First, computational formulas are given to assess the influence of perturbations
on linear combinations of the parameter estimates for maximum likelihood. A
diagnostic plot that identifies sensitive fitted values and selfpredicting observations
in linear regression is introduced. Second, the eigensystem of the acceleration matrix
for perturbing each element of the design matrix in linear regression is derived.
vii
This eigensystem leads to a diagnostic plot for identifying specific elements that
are influential, and it is shown how the results relate to principal components,
collinearity, and influence diagnostics.
Next, it is shown that the conceptual framework and computational tools
easily extend from maximum likelihood to inference that is performed using
estimating equations. In particular, local assessment of parameter estimates and
standard errors derived from optimal estimating functions is addressed. Algebraic
expressions and computational formulas are given for both first and secondorder
assessment, and applications such as quasilikelihood are considered.
Finally, the techniques are applied to linear mixed models in order to assess
the influence of individual observations and clusters. While previous applications
considered only estimation, the effects of perturbations on both estimation and
prediction is addressed. It is shown that the effects on the fixed and random effect
inferences can be assessed simultaneously by using an influence measure based on
the unstandardized predictive loglikelihood of the random effects.
k.
CHAPTER 1
INTRODUCTION TO LOCAL INFLUENCE ; .'
1.1 Premise
George Box (1982, p. 34) stated "all models are wrong; some models are
useful," and A. F. Desmond (1997b, p. 117) stated "a robust method ought not to
perform poorly in neighborhoods of the nominally true model." Indeed, conclusions
based on a statistical model should be reliable under small perturbations to the
data, the model's assumptions or the particulars of the estimation method. In a
seminal paper. Cook (1986) utilized the basic concept of perturbing the postulated
model or the observed data to develop a variety of diagnostic tools. He showed how
small perturbations can be used to develop easilycomputable diagnostics for almost
any model estimated by maximum likelihood (ML) estimation.
1.2 Measuring the Effects of Perturbations
In a statistical analysis, parameter estimates are a crucial part of the
decisionmaking process. These estimates are a function of the observed data, the
postulated model, and the analyst's choice of statistical inference technique. Any
one of these three aspects of a statisticalmodeling situation is subject to uncertainty
and doubt: observed data may be incorrectly recorded or subject to rounding
error, a slightly different model may be just as plausible as the postulated model,
and nuisance aspects of modeling such as choosing hyperparameters are subject to
secondguessing. If small changes in these aspects of a statistical modeling problem
create large changes in parameter estimates, the analysis is unreliable. Therefore,
the change in parameter estimates due to perturbation can serve as a barometer for
the robustness of the scientific conclusions.
Following Cook (1986), a vector of q perturbations is denoted as
u = {u}i,U2, . ■ ■ ,ujq)', and the gdimensional space of possible perturbations is
denoted as ft. Included in this space is one point, wo, that does not change
the original formulation of the problem, and this is called the null perturbation.
The perturbations are not parameters to be estimated, but rather they are
known realvalued quantities introduced into the problem by the analyst. These
perturbations might represent measurement errors, unequal variances, case weights,
correlated errors, missing predictors or some other change to the data, model or
inference method. Prudent choice of perturbations can lead to useful diagnostics.
For example, if parameter estimates are sensitive to perturbing the contribution
(weight) of a particular observation to estimation, it indicates that the observation
is overly influential on the estimates.
For data y and parameters 6, let L{0] y) denote the loglikelihood from
the original postulated model, and let L^{9\y) denote the loglikelihood resulting
from the inclusion of perturbations. If the perturbation specified is the null
perturbation, then the two likelihoods are equivalent. The resulting maximum
likelihood estimators (MLEs) of these two likelihoods are denoted as and 0^^,
respectively. Cook (1986) used the likelihood displacement to summarize parameter
estimate changes under perturbation:
LD(u;) = 2[L(0;y)L(0,;j/)]. (1.1)
From inspection of (1.1), we see that the likelihood displacement is the difference of
the original likelihood evaluated at two values: the original MLEs and the perturbed
MLEs. Because maximizes the original likelihood, LD{u}) is nonnegative and
achieves its minimum value of zero when u; = Wq As the perturbed MLEs become
increasingly different from the original MLEs, LD{u)) becomes larger. Thus, the
size of LD{u}) provides a measure of how much the perturbation affects parameter
estimation.
Effects on a subset of parameters, Oi from 0' = {e[, O'^), can be obtained by
using the profile likelihood displacement:
LD^''^{Lj) = 2[L{e,,02;y)L{0i^,g{ei^y,y)], (12)
where Oi^ is from 9^ = {0[^, 9^^, and g{9i^) is the vector that maximizes the profile
loglikelihood, L{9i^,92), with respect to 02 The profile likelihood displacement
allows influence to be "directed" through 9i, with the other parameter estimates
only changing in sympathy.
In the next section it is shown how to obtain diagnostic tools from a
local influence analysis of the likelihood displacement and the profile likelihood
displacement. "*:..'%>• _v. ^
« '■ ■• t '' • : . ■■*■
'^.^ Vr, V>' 1.3 Local Influence Diagnostics
. Although we could simply choose values for a? and compute LD{lj), this
is not particularly useful for two reasons. First, LD{u:) is often not a closedform
function of w. Hence, if we wish to find its value, refitting the model may be
necessary. Second, even if calculating LD{u}) was easy, we must somehow decide
what values to use for w. This choice would be highly datadependent and a
nuisance for the analyst.
Therefore, we would like a methodology that can examine the information
contained in the function LD{u}) and yet does not require a choice of actual values
for a> nor involve reestimating the model parameters. Following Cook (1986),
this can be accomplished using the notion of curvature. Simple plots can then be
used to identify whether estimation is sensitive to any particular elements of the
gdimensional perturbation.
■B^?;. ■ •' i,r^..i?!■
* /•
Figure 1.1: Schematic of a plane curve with its tangent line. The angle between the
tangent line and the xaxis is (f).
1.3.1 Curvature of a Plane Curve
Curvature is a measure of how quickly a curve turns or twists at a particular
point (Swokowski 1988). Let {f{t),g{t)) be a smooth plane curve. The arc length
parameter s of such a curve is the distance travelled along the curve from a fixed
point A to another point on the curve (Figure 1.1). Letting Ia denote the value of t
that yields point A, s can be computed as
{t)= [\r{u)f + {g'{u)f)Uu,
JtA
(1.3)
and we may express the curve in terms of s: {f{s~^{s)),g{s~^{s))). Additionally, let
T{s) be the tangent line at a point on the curve, and let (j) denote the angle between
T{s) and the xaxis.
Curvature is defined as the absolute value of the rate of change in with
respect to s:
50
C =
ds
(1.4)
A plane curve that is turning quickly at {f{t),g{t)) has a larger curvature than a
curve that is nearly straight at that point.
! Ct
Figure 1.2: Curve with two circles of curvature
The reciprocal of curvature is called the radius of curvature, and it is the
radius of the best fitting circle at that point on the curve. These circles of curvature
have the same first and second derivatives and tangent line as the curve and lie on
its concave side. Figure 1.2 shows the curve (sin(f), \sm{2t)) and two of its circles
of curvature. The one on the left occurs when t = ^ while the other one occurs
when t = arccos(O). Their curvatures are 6 and 8/9 respectively. The expression
"turning on a dime" refers to to a sharp turn: one in which the radius of curvature
is small, and consequently the curvature of the path is large.
The formula for curvature is
\f'{t)g"{t)g'{t)f"{t)
C
We are particularly interested in assessing curvature at the local minimum of a
function from x to y, i.e. when x = t. For this special case, curvature measures
how quickly the function increases from that minimum value. In this case, (1.5)
simplifies to the acceleration of the curve:
d'g{x)
(1.5)
C =
dx^
(1.6)
X^Xq
Figure 1.3: Likelihood displacement with directional vectors
where g{xo) is the function's minimum. This result can be applied to LD{u}) at u;o
when there is only a single perturbation, lo. Here, the curvature is given by
:■ . \. d^LD{uj)
doj^
(1.7)
Ul=(jJo
and it measures how quickly the parameter estimates change when u is moved
away from the null perturbation. A large curvature indicates that even if uj is close
to, or local to ljq, the likelihood displacement increases substantially. Hence, a
large curvature indicates that even a slight perturbation can be very influential on
parameter estimates.
1.3.2 Using Plane Curves to Describe LDju:)
The curvature of plane curves can also be used to describe LD{u}) when
q > 1. Now u> can be moved away from wq in different directions, as shown by
the vectors in Figure 1.3 for a twodimensional perturbation. These vectors can be
written as
U3v{a) = Wo + av,
(1.8)
Figure 1.4: Intersection of plane and surface creates space curve
where the vector v determines the direction and the value a determines the position
relative to wq Let Py denote a plane that contains u}y{a) and extends directly
upward from ft to intersect LD{(jj) (Figure 1.4). The intersection of Pt; and LD{u})
forms a space curve, (a;„(a)',LZ)(u;„(a)))^ . Because this space curve is contained
within the plane Py, its curvature can be obtained as if it were a plane curve,
(a, LD{u}y{a)). Thus the curvature of LD{u:y{a)) at Wo is simply
d^LD{u}y{a))
L/y —
dd^
(1.9)
a=0
Different directions produce curvatures of various sizes. The normalized vector that
produces the largest curvature is denoted Umax and is of particular interest.
1.3.3 Direction of Maximum Curvature, v
1 '^max
The relative sizes of the elements of Vmax indicate which elements of ui are
most influential on the parameter estimates. For example, if the i*'* element of Uma:
^ More formally, these space curves are called normal sections. The term comes
from the fact that each plane, Py, contains the surface's principal normal vector, the
vector that is orthogonal to the surface's tangent plane at ijJq. Curvatures of the
normal sections are referred to as normal curvatures.
8
is large, then Wj is a key player in producing the largest curvature of LD(a;) at the
null perturbation, i.e. in producing an immediate change in parameter estimates.
The direction of maximum curvature is the main diagnostic of Cook's local
influence methodology. Typically, Vmax is standardized to have length one and is
then plotted against indices l,2,...,q. If any elements are particularly large relative
to the others, they stand out in the plot. The analyst then may want to take steps
to alleviate the sensitivity to the perturbation. What, if any, steps to be taken
depend upon the perturbation scheme and the particulars of the data analysis.
For some models and perturbation schemes, the formula for Vmax is known
explicitly, while in other situations it requires some moderate computations. In
analyses such as regression, Vmax is typically a function of familiar quantities.
1.4 Perturbations in Regression
Consider the normal linear regression model:
Y = Xl3 + e, (1.10)
where V is a n x 1 vector of responses, X is a known n x k model matrix of rank k,
P is a k X 1 vector of unknown parameters, and e is a n x 1 vector of independent
identically distributed normal random variables with mean and variance cr^ > 0.
Hence, Y ~ N{X^, a2j„), and = (/?i, ...,l3k, a^)'.
1.4.1 Scheme 1: Measurement Error in the Response
In order to examine the effects of measurement error in the response,
Schwarzmann (1991) introduced perturbations to the observed data. He considered
fitting a model to the observed data set with slight perturbations: tji + a;,, for
i — 1,. . . ,q. Hence, ojo is a vector of zeroes for this perturbation scheme. By letting
q — n, all of the observations can be perturbed simultaneously. This allows the
V^^^^ T ,F%v" t
I "7 >«.»j
■ J*
Table 1.1: Scottish hills race data
Observation
Record Time (min.)
Distance (miles)
Climb (ft)
1
16.083
2.5
650
2
48.350
6.0
2500
3
33.650
6.0
900
4
45.600
7.5
800
5
62.267
8.0
3070
6
73.217
8.0
2866
7
204.617
16.0
7500
8
36.367
6.0
800
9
29.750
5.0
800
10
39.750
6.0
650
11
192.667
28.0
2100
12
43.050
5.0
2000
13
65.000
9.5
2200
14
44.133
6.0
500
15
26.933
4.5
1500
16
72.250
10.0
3000
17
98.417
14.0
2200
18
78.650
3.0
350
19
17.417
4.5
1000
20
32.567
5.5
600
21
15.950
3.0
300
22
27.900
3.5
1500
23
47.633
6.0
2200
24
17.933
2.0
900
25
18.683
3.0
600
26
26.217
4.0
2000
27
34.433
6.0
800
28
28.567
5.0
950
29
50.500
6.5
1750
30
20.950
5.0
500
31
85.583
10.0
4400
32
32.383
6.0
600
33
170.250
18.0
5200
34
28.100
4.5
850
35
159.833
20.0
5000
Source: Atkinson (1986)
10
analyst to determine which observations would have the most effect on parameter
estimates in the event that there is measurement or rounding error.
1.4.1.1 Influence on the full vector of parameter estimates
Atkinson (1986) analyzed a data set of record times for 35 diff'erent hill races,
as recorded by the Scottish Hill Runners Association (Table 1.1). Scatter plots
indicate the positive association of the three variables: winning time, race distance
and race climb. (Figure 1.5). The influence of perturbations on the parameter
estimates from a regression of winning time on DISTANCE and CLIMB is of
interest. For convenience, the regressors and response have been centered and scaled,
thereby eliminating the need for an intercept. Thus, the full vector of p parameters
is 9 = (/^distj^cUmbjO" )'•
Figure 1.6 shows Vmax plotted against the indices 1 through g = n = 35 for
the Scottish hills data. Element 18, which corresponds to perturbation of the 18*'*
observation, is the dominant contributer to the direction of maximum curvature.
This observation is from a short race with little CLIMB, and yet it has a large
record time. Similar races have much smaller record times, indicating that race 18
is an outlier. Atkinson (1986) has actually suggested that the winning time for this
race may have been 18.67 minutes rather than 1 hour and 18.67 minutes.
Interestingly, it can be shown that the direction of maximum curvature for
Scheme 1 is proportional to the residuals. Thus, if we standardized the vector
of residuals to length 1 and plotted them against case indices, we would obtain
Figure 1.6. This implies that parameter estimates are most sensitive to a set of
small additive measurement errors when those errors tend to exaggerate (or correct)
the model's error in estimating each data point, with the most exaggeration (or
correction) occurring for the most poorly predicted points.
%' > ■''"  ''' :'^' y
i *
11
Time
200
150
100
•11
•18
:••
/
5 10 15 20 25 30
Distance
Time
200
150
100
•11
• 18
J.:
• •
••••
1000 2000 3000 4000 5000 6000 7000 8001
niitb
(a) Winning time vs. race
distance
(b) Winning time vs. race
climb
Climb
80UU
7000
• 7
6000
_..
5000
4000
•
•t
3000
2000
1000
1 •
t
.11
5 10 15 20 25 30
Distance
(c) Race climb vs. race dis
tance
Figure 1.5: Hills data; scatterplots with highlighted cases
0.75
« 0.5
I
^ 0.25
in
§0.25
e
w
0.5
0.75
* •
. « . * • • * ♦ ' *»\ *
• •
• . •
5 10 15 20 25 30 35
Case
Figure 1.6: Hills data; response perturbation; Umax for
12
1
0.75
0.5
0.25
0.25
■0.5
0.75
•1
■ . t t*
*„*i
«. »• 10 . *' *^',»«», H *^''
1
0.75
0.5
0.25
0.25
0.5
0.75
1
• • •
• t,5 »*if ,15 ,•%* * »• V , •35
(a) Umax for directed influ
ence on $^st
(b) Umax for directed influ
ence on /3climb
Figure 1.7: Hills data; response perturbation
1.4.1.2 Influence on a subset of parameter estimates
Plotting Vmax for the profile likelihood displacement, LD^^^^{(jj), can indicate
which components of a; affect a subset of parameters. Examining influence on
§1 = (7^ results in the same Umax as the original likelihood displacement, i.e. it is
proportional to the residuals. The directions of maximum curvatures for 6i = ^dist
and §1 = /3cUmb are shown in Figures 1.7(a) and 1.7(b). The first plot indicates that
small measurement error in race 11 's winning time is most influential on ^dist In the
plot of Umax for /^cUmb, elements 7 and 11 stand out. The fact that the two elements
have opposite signs can be used for interpretation. They indicate that maximum
curvature is primarily achieved by perturbing winning times 7 and 11 in opposite
directions (i.e. making one observation smaller and the other larger). Thus, if the
record winning times for races 7 and 11 were slightly closer together or farther apart,
this would aff'ect the slope estimate for CLIMB.
1.4.1.3 Plotting options
Note that the standardized i>max is only unique up to sign change. For
example. Figure 1.8 displays the same standardized vector as Figure 1.6, but with
all of the elements' signs switched. This suggests an alternative plot in which the
1
o.n
0.5
0.25
0.25
o.s
•0.75
• •
5 K» ^5 2Tf "ft 30 ,35
Figure 1.8: Hills data; alternate rendering of v^
13
signs of the elements are suppressed: a star plot. Here, each element is displayed
as a point, with the distance from the origin equal to its absolute value. The q
elements are simply displayed around the point (0,0), starting in the first quadrant
and ending in the fourth quadrant. The plot is completed by labelling the points
with indices. Labeling all points can result in cluttered plot (1.9(a)), and so a better
option is to only label the largest elements (1.9(b)). In these renderings, element
18 is again noticeably large, but our attention is also drawn to element 7. This
observation has the characteristics of an influential observation: it has the highest
climb and one of the longest distances (Figure 1.5(c)), as well as the second largest
residual.
Unlike the index plot, the star plot does not give the illusion that the
indices are in a specific order. However, the suppression of element signs can be an
advantage or a disadvantage. The advantage is that when looking at several plots
of directional vectors, suppressing the signs helps the analyst identify patterns by
emphasizing element size. The disadvantage is that the diagnostic interpretation of
the signs is lost.
i^vV
0.5
26 , 31
0.5
0.5
(a) All elements numbered
1 1
0.5
0.5
14
(b) Largest elements numbered
Figure 1.9: Hills data; vmax as a star plot
1.4.2 Scheme 2: Perturbations to Measurement Precision
Another assumption that can be perturbed is that of equal variance (Cook
2
1986). This is accomplished by letting V{yi) = ^ for i = 1, . . . , n. Here the null
perturbation is Wq = In
With this perturbation scheme, we deem that some observations are observed
with more precision than others. The resulting perturbed MLEs weight each case
according to its perturbed variance. If certain cases are influential, then even a
slight downweighting or upweighting of them produces large displacements.
This perturbation scheme has an important connection to the widelyused
influence measure Cook's Distance (Cook 1977). Recall that the i*^^ Cook's Distance
is a measure of the change in regression parameter estimates that results from
deleting the i*'' observation. Clearly a deleted observation provides no information for
finding /9. Similarly, as the i"* perturbation is decreased from 1, the i*'* observation
provides less and less information to the estimation process owing to its increased
:.   • . ^ 15
variance. Thus, casedeletion can be thought of as a rather drastic perturbation:
one that results in a variance > oo for the i*^ observation. Casedeletion is referred
to as a global influence technique because it assesses the effect of perturbations that
are not local to the null perturbation in ft.
1.4.2.1 Influence on the full vector of parameter estimates
Figure 1.10(a) shows the direction of maximum curvature for the Scottish
hills data set. Again, element 18 dominates the direction of maximum curvature.
This implies that the precision of observation 18 is the most influential on the full
set of parameter estimates.
1.4.2.2 Influence on a^
Using the profile likelihood displacement, influence directed upon a^ alone
can be assessed. Observation 18 is again implicated by Umax as being most locally
influential (Figure 1.10(b)).
1.4.2.3 Influence on individual regression parameter estimates
Figure 1.10(d) shows Umax when isolating effects on /^aist Although there are
several points that stand out, observation 7 is identified as being most influential
since element 7 is large. This race is also implicated as being locally influential in
the plot of Umax for Z^cUmb (Figure 1.10(c)). In fact, the DEBET AS diagnostic
(Belsley et al. 1980) also identifies race 7 as being influential on /5cUmb Recall that
the value of DEBET ASi^j measures how the f^ parameter estimate is affected
by deleting the i^^ observation. Writing the n DEBET AS for the /'' parameter
estimate in vector form, it is shown in Section 2.5 that this vector is proportional to
the following Hadamard (elementwise) product of vectors:
16
It
0.75
0.5
0.25
■0.25
•0.5
■0.75
1
5 10 15 20 25 30 35
1
0.75
0.5
0.25
■0.25
0.5
■0.75
■1
5 10 15 20 25 30 35
(a) Vmax for 6
I
(b) Vmax for ^^
1
0.75
0.5
0.25
■0.25
■0.5
■0.75
•1
» . ** • * ** *. »» » * * A t«»* A **» * * ' 
• * flV "i^ 20 '2^ 30 36
1
0.75
0.5
0.25
■0.25
■0.5
■0.75
•1
,5 ft ** «0 •• 25 30 ,35
(C) Vmax for /9cUmb
(d) Vmax for /9dist
1
0.75
0.5
0.25
■0.25
■0.5
■0.75
I
« »". ' »» * »»»«» % «t. jU »»t * ' »
I ft' >5i 27?? 30 ^
1
0.75
0.5
0.25
■0.25
■0.5
■0.75
■1
• * •
I »»'» »»» t . ' •* . ' * »m»* * »■
, S ft •!* ^0 •• 25 30 3!
• • •
(e) Umax for
(f) V2 for
Figure 1.10: Hills data; variance perturbation
17
1
0.75
0.5
0.25
0.25
0.5
0.75
1
• f IT) •l5* 20 2^ 30 ft
Figure 1.11: Standardized vector oi DFBETAS for /3cUmb
«r
1/111
1
1/122
\ lh„n /
r2
^2i
V ^nj J
(1.11)
Here, /i,i is the i^^ diagonal element of the "hat" matrix X{X'X)~^X' , r^ is the i*'^
residual, and rij is the z*'' residual from regressing Xj on the other columns of X.
It can be shown (see Section 2.5) that Umax for directing the effects of the variance
perturbation on $j is proportional to
t\ /.,^
^2
^2j
(1.12)
\^n ) \ rnj j
The elements of the two vectors differ only by a factor of y^ • In fact, standardizing
the vector of 35 DFBETAS for ^cUmb to length 1 and plotting them (Figure 1.11)
produces a figure nearly identical to Figure 1.10(c).
18
Influence on both regression parameter estimates
Figure 1.10(e) displays the direction of maximum curvature for directed
influence on ^. Plot (f) is the direction of largest curvature under the constraint of
orthogonality to Vmax The two plots are very similar to the directed influence plots
in (c) and (d).
1.5 Basic Perturbations
As an alternative to finding the direction of maximum curvature, we may
utilize basic perturbations. The i*'' basic perturbation is the same as the null
perturbation except for an additive change of 1 in its i*'' element. For example, the
first basic perturbation is Wq + (1, 0, . . . , 0)', and corresponds to moving a distance
of 1 along the first axis in fi. Typically, the curvatures for each of the q basic
perturbations would be computed and then plotted together.
As an example, consider applying the variance perturbation to a linear
regression, and examining directed influence on ^. In this case, the i*'' basic
curvature, Cfr., is a local influence analog to the z*'' Cook's Distance, because it
corresponds to re weighting a single observation. It is shown in Section 2.7.2.1 that
the i^^ basic perturbation for this scheme measures influence on the i^^ fitted value.
Figure 1.12(a) plots the 35 curvatures for the basic perturbations applied to the
Scottish hills data, while Figure 1.12(b) plots the 35 Cook's Distances. Although
the scales differ, they provide similar information. Both plots draw attention to race
7, but they differ somewhat in their emphasis of races 11 and 18.
* 1.6 Brief Literature Review
Local influence techniques have been applied to a number of statistical
models, most notably univariate and multivariate linear models (Cook 1986; Cook
and Weisberg 1991; Lawrance 1991; Schwarzmann 1991; Tsai and Wu 1992; Wu
and Luo 1993a; Kim 1995; Kim 1996c; Tang and Fung 1996; Fung and Tang 1997;
19
• . •
5 10 15 20 25 30 35
(a) Basic curvatures
.J'
2.5
!
1.5
0.5
5 10 15 20 25 30 35
(b) Cook's Distances
Figure 1.12: Hills data; C^'s for variance perturbation and Cook's Distances
Kim 1998; Rancel and Sierra 1999). A number of articles have also appeared on
applications in generalized linear models (GLMs) (Cook 1986; Thomas and Cook
1989; Thomas and Cook 1990; Thomas 1990; Lee and Zhao 1997) and survival
analysis (Pettitt and Baud 1989; Weissfeld and Schneider 1990; Weissfeld 1990;
Escobar and Meeker 1992; Barlow 1997; Brown 1997). Other applications include
linear mixed models (Beckman et al. 1987; Lesaffre and Verbeke 1998; Lesaffre et al.
1999), principal components analysis (Shi 1997), factor analysis (Jung et al. 1997),
discriminant analysis (Wang et al. 1996), optimal experimental design (Kim 1991),
structural equations (Cadigan 1995; Wang and Lee 1996), growth curve models
(Pan et al. 1996; Pan et al. 1997), spUne smoothing (Thomas 1991), BoxCox
transformations (Lawrance 1988; Kim 1996b), nonlinear models (St. Laurent and
Cook 1993; Wu and Wan 1994), measurement error models (Zhao and Lee 1995; Lee
and Zhao 1996), and elliptical linear regression models (Galea et al. 1997).
Many authors have presented modifications of, extentions to, and theoretical
justifications for the basic methodology (Vos 1991; Schall and Gonin 1991; Schall
and Dunne 1992; Farebrother 1992; Wu and Luo 1993a; Billor and Loynes 1993;
Kim 1996a; Fung and Kwan 1997; Lu et al. 1997; Cadigan and Farrell 1999; Poon
20
and Poon 1999). Extentions to Bayesian methodology have also been considered
(McCulloch 1989; Lavine 1992; Pan et al. 1996).
..'■■,"' 1.7 Outline
In Chapter 2, computational formulas for local influence analysis are discussed
and derived for the two perturbation schemes discussed in this chapter. Chapter 3
contains derivations and examples of local influence diagnostics for linear regression
under perturbations to the X matrix. In Chapter 4, the methodology is extended to
a general class of influence measures and estimation techniques. Computational tools
are also provided. In Chapter 5, inference and local influence analysis for prediction
problems are discussed. Techniques for assessing estimation and prediction in linear
mixed models are developed and applied to data. In Chapter 6, some preliminary
results that can be applied to multistage estimations are presented. The chapter
also contains commentary on additional areas of future research. The appendices
contain details for data analyses.
CHAPTER 2
COMPUTATIONAL ISSUES IN LOCAL INFLUENCE ANALYSIS
In this chapter, computational formulas for local influence are reviewed
(Cook 1986; Beckman et al. 1987), To demonstrate the mechanics of the formulas,
detailed derivations of previous results for the regression model (Cook and Weisberg
1991; Schwarzmann 1991) will be presented with minor additions. Next, the issue
of how to obtain objective benchmarks is considered, and some new novel solutions
are outlined. Finally, local influence is discussed under reparamaterizations of the
perturbation space ft and the parameter space 0. By applying previous results
(Escobar and Meeker 1992; Pan et al. 1997), some new diagnostic tools for influence
on fitted values in linear regression are derived.
2.1 Curvatures and the Direction of Maximum Curvature
Cook (1986) showed that the curvature of LD{u}) in direction v can be
expressed as
C^ = 2v'A'{L~^)Av, (2.1)
where —L is the observed information matrix of the original model,
L
deae'
e^e
and Apx, has elements
_d'uey)
'^ dOidui
O=0,u:=u}o
for i = 1, . . . ,p and j = 1, . . . ,q. This result is proved in Chapter 4 as a special case
of Theorem 4.2.
21
22
The qy~ q matrix F = 2A'{L )A is the acceleration matrix. It is easily
obtained because A is a function of L^id; y), and L is typically computed
as part of ML estimation since it serves as an estimated asymptotic covariance
matrix for the parameter estimates. Hence, computing the curvature in a particular
direction is straightforward. In addition, the diagonal elements of the acceleration
matrix are the q basic curvatures. Finally, it is well known how to obtain the
maximum value of a quadratic form such as (2.1) (Johnson and Wichern 1992,
pp. 6668). Specifically, C^ax is the largest eigenvalue of F, and Umax is the
corresponding eigenvector. Hence, finding Cmax and Umax is numerically feasible.
The acceleration matrix can provide additional information. For example, the
second largest eigenvalue is the largest curvature in a direction that is orthogonal
to Umax Hence, its corresponding eigenvector indicates a direction of large local
influence that is unrelated to the first one. Additional elements of the spectral
decomposition of F can be used in a similar fashion.
A consequence of this derivation is that it shows how local influence
techniques reduce the dimensionality of examining LD{u}) via a small number of
curvatures and directional vectors. It is well known that the number of nonzero
eigenvalues of a symmetric matrix such as F is equal to its rank. Since it is a
product of other matrices, the rank of F is no more than the minimum of their
ranks, which is typically p.
In the regression examples given previously, p = 3 and g = 35, meaning that
the local influence approach can summarize the 35dimensional surface LD{uj) with
three curvatures and directional vectors. Similarly, LD^^^uj) was summarized with
just two, as shown in Figures 1.12(c) and (d).
Despite the advantages of the eigenanalysis of F, one difficulty that arises
is the occurence of eigenvalues with multiplicities greater than one. For example,
if C'max = Ai has multiplicity greater than 1, then there is no unique Vmax to plot.
23
In these situations it may be more prudent to look at directed influence on single
parameter estimates.
2.2 Local Influence Analysis of a Profile Likelihood Displacement
The profile likelihood displacement, LD^^^\u}), was presented in Section
1.2 as a measure of influence directed through Oi. Cook (1986) showed that the
acceleration matrix can be expressed as
f''^ = 2A'(£''  B22)A,
(2.2)
where
B22 —
^22
and L22 comes from the partition
•'^■>V''^  ■■'■■■' ■■.. .
J
J
^11
L12
L22
w
where
d'L{e;y)
'' dOM.
9=0
This result is proved in Chapter 4 as a special case of Theorem 4.2.
Beckman et al. (1987) considered directing influence on a single parameter
estimate. They noted that in this case, the acceleration matrix is the outer product
of a vector, and hence has only one nonzero eigenvalue. Without loss of generality,
suppose that we wish to direct influence on the first parameter estimate in 0. The
eigensystem is then given by
A = 2A'T
V oc A'T,
(2.3)
24
where T is the vector for which TT' = {L  Baa) It can be expressed as
(2.4)
where c = Ln — L12L22 L21.
2.3 Building Blocks of Local Influence for Regression
In this section, the building blocks of F for a multiple linear regression are
constructed. Subsequent sections apply these results to the response and variance
perturbations used in Chapter 1.
2.3.1 Notation
• Unless otherwise noted, the derivations assume the presence of an inter
cept in the model so that the residuals sum to zero. Centering and scaling
the regressors tends to faciliate interpretation, but is not necessary.
• The number of regressors is k — p — 1.
• H = X{X'X)^X.
• Xi represents the i*^ column of X, and x\ represents the i*'' row of X.
• r = (/„ — H)y represents the residuals.
• Tj represents the residuals from fitting a model to the i*'' predictor using
the other predictors. These can be considered as values of an adjusted
predictor (i.e. adjusted for relationships with other predictors). Note:
Hvi = Ti and (/ — H)ri = 0.
• X(j) represents the X matrix with the i^^ column removed. Similarly,
/S/^) is the regression parameter vector with the i*^ element removed, and
A(i) is A without the i"* row.
'"<. % *
.rl
25
2.3.2 Acceleration Matrix
The ML estimates of are the same as those of ordinary least squares,
3 = {X'Xy^X'y , and the ML estimate of a^ is ^. The p x p estimated
asymptotic variancecovariance matrix of the MLEs is ' >
L =
a2 {x'xy^
0'
251
n
Partitioning A' into [A'^ A'^2], the acceleration matrix is given by
F = 2A'(£ )A
Aa'
= 2a^A'^ {XX'y Ap + K,A,2
lb
2.3.3 Acceleration Matrix for Directed Influence on a^  • •
The acceleration matrix for directed influence on a"^ will utilize the matrix
L — B22) which is given by
/ 2,7''
This implies that the acceleration matrix for directed influence on cr is
F^"'^ = 2A'(X"'  B22)A
= — A;2A^2.
n
(2.5)
Noting that A'^2 is in fact a 9 x 1 vector, there will be a single eigenvalue and
eigenvector given by
\ _ 4ff'' A'
vi oc A'^2
These quantities are the maximum curvature and its direction, respectively.
26
2.3.4 Acceleration Matrix for Directed Influence on /3
When directing influence on /3, we have
L — B22 —
a^xx'y^
C
The resulting acceleration matrix is
f^^=2a'A'f,{XX')'Ap. • (2.6)
Note that this implies that directed local influence on ^ is equivalent to local
influence for /9 when treating a^ as known and equal to its MLE.
2.3.5 Acceleration Matrix for Directed Influence on a Single Regression Parameter
By applying the general results of Beckman et al. (1987), we can construct T
(2.4) for directed influence on $1 in a regression model. Using results for the inverse
of a partitioned matrix, TT' = {L  B22) can be expressed as
(X(i)X(i)) X(i)Xi
XiX(i)(Xji)X(i)) ^
(2.7)
where d = X'lXi  XiX(i)(X'(i)X(i)) ^X'^^^Xi.
We denote {X'/^.X (^i))~^ X'^^^^X i by 7^, as it is the regression parameter
estimates from modeling Xi as a linear combination of the other explanatory
variables. This simplifies the notation of (2.7) greatly:
ir'  B22) = TT^,
Fi
1
t'i
7i
7i7i
0'
(2.8)
27
Using these results, we find
Fil
1
7i
(2.9)
and this leads to the following expressions for the maximum curvature and its
direction when directing influence on /5i :
C^ax = 2A'TI = j^\\A'^,  A^^^^Till t^max oc A'T a A^^  A^^^^t^.
The next two sections apply these results to document more completely the
two perturbation schemes given in Chapter 1.
2.4 Perturbations to the Response in Regression
The results of this section closely follow those of Schwarzmann (1991),
with only minor additions. As in Chapter 1, a set of additive perturbations are
introduced into the observed data to examine the influence of measurement error in
the response. An n x 1 vector of perturbations is introduced such that the perturbed
likelihood is based on Y + a; ~ N{X/3, a^In) The null perturbation is Wq = 0„.
Equivalent results can be obtained by perturbing the model of the mean to be
X/3 — (jj. That is, by considering a model assuming Y ~ N{XP — co, (T^In) Hence,
the measurement error perturbation can be interpreted as a type of meanshift
outlier model; each perturbation is a "standin" for an additional parameter that
could correct or exacerbate lackoffit.
28
2.4.1 Building Blocks
The perturbed likelihood is
T? 1
K{e;y)oc\oga'^{yX0 + ujy{yXl3 + uj)
n 1
=  loga^  — (y + u>)'(y + w)
n 1
=  loga^  ^(y'y + 2u;'y + w'a;),
where y = y  X^. The next step is construction of A. Partial derivatives with
respect to a> are as follows: .,
dK{0;y) 1
du:
2(t2
1
(2y + 2u;)
1
(y + w)
(yX/3 + a;).
Second partial derivatives are then
d'L^{0;y) 1
X
%l
Evaluating at the original model yields
d'L^{e;y)
du:dl3'
d'L^e^y)
1
0=0,U}=OJo ^
divda^
From these quantities, the n x p matrix A' can be constructed:
1
A' =
X 4^r
^ '  V
29
Finally, the acceleration matrix is
~ ' 1
F = 2—
X I
<t2 (XX')"'
0'
2£l
n
^2
X'
1 ».'
(2.10)
2 4
:^2
2.4.2 Directed Influence on a
The acceleration matrix for the profile likelihood displacement for a^ is
simply the second term of (2.10):
= — ttTV .
na^
(2.11)
Solving the characteristic equations {F"  XI)v = directly, the single
eigensolution is
C/tnax — X
max — '^max a2
^max 0^ • •
Therefore, as mentioned in Chapter 1, Umax for directed influence on a'^ is
proportional to the residuals,
2.4.3 Directed Influence on ^j
Without loss of generality, consider directed influence on $i. Rather than
find the acceleration matrix, we proceed directly to determining the sole eigenvector
proportional to A'T and the corresponding curvature 2A'Tp, as per (2.3) and
i" r' —  ■e?^>
30
(2.9):
X"! X(i)
ri\
1
7i
= ^ [^1  ^(l)"^!]
ari
1
•^Ikil
a ri
rn
and
a
o" Fil
r^i
Therefore, the direction of maximum curvature for directed influence on $j is
proportional to the f^ adjusted predictor.
2.4.3.1 Directed influence on /3 •
The acceleration matrix for the profile likelihood displacement for $ is the
first term of (2.10):
M
^H.
(2.12)
This matrix has a single nonzero eigenvalue of ^ with multiplicity k, implying that
there are an infinite number of directional vectors that have maximum curvature.
Although not discussed by Schwarzmann (1991), one set of orthogonal solutions can
be motivated by considering the canonical form of the model:
Y ^Zoc + e,
(2.13)
31
where
Z = Xifi
a = <fi'l3
ip =
<Pi ■
• fk
and
(p= the i*'* eigenvector of X'X. ' >.
Because (pep' = Ik, this is simply a reparameterization of the original regression
model. The new regressors, Zi = Xip^, ...,Zk = Xp^., are known as the principal
components, and they are mutually orthogonal. In addition, each satisfies the
characteristic equations oi F : '
{f'^P)z. = ^Ahi)x^,
2
^' (2.14)
= 0.
Therefore, the standardized principal component regressors can serve as an
orthogonal set of directional vectors, each of which has maximum curvature. Note
also that the k adjusted regressors, ri, . . . , rjt, are also directions of majcimum
curvature, but they do not constitute an orthogonal set.
2.4.3.2 Influence on 9
The acceleration matrix F given in (2.10) has two nonzero eigenvalues: ^
and j2, with the second one having multiplicity k. The first eigenvector is a r,
and the set of k principal components can be used to complete an orthogonal set of
eigenvectors. Comfirmation of these solutions is straightforward.
■ ' 2.5 Unequal Measurement Precision Applied to Regression
Cook introduced this perturbation in his original (1986) paper on local
influence, and the results in this section come from his work. An n x 1 vector
of perturbations is introduced such that the perturbed likelihood is based upon
32
Y ~ N{X0,a'^D{l/u})), where D{l/u:) denotes a diagonal matrix with i*'' element
^. The null perturbation is a?o = In The n basic perturbations would be the local
versions of the n Cook's distances (Cook 1977; Cook 1979). In addition, the n basic
perturbations applied to LD^^^u) are local influence analogs of the n DFBETAS
for ^j.
A second interpretation of this perturbation scheme comes from the idea
of case weights. Case weights are used in an analysis when each observation has
been sampled with unequal probability. Under disproportionate sampling, cases are
weighted by the inverse of their sampling probability in order to obtain unbiased
regression parameter estimates. In the normal linear model, this weighting is
numerically equivalent to perturbing the error variances. Hence, this perturbation
scheme can be used to examine sensitivity to the assumption of random sampling.
2.5.1 Building Blocks
To begin, we develop the perturbed likelihood, which is
■n 1
U0;y) oc ~^\oga'  — (y  X^)'D(u;)(j/  Xfi)
= ^ log a^  —y'D{uj)if.
The next step is construction of A. Partial derivatives with respect to /3 are as
follows:
^^"j^' ^^ = ^(2X'D(u;)y + 2X'D{u:)Xfi)
= ^X'D{u:)y,
<t2'
and the partial derivative with respect to a is
dL^{0\y) n 1 ,,
33
Second partial derivatives are then
and evaluating at the original model yields:
d'LUe;y)
dl3du}'
1
da^doj'
2^4^^'^^ 2^4 ^*«'
0=9,uj=u)o
where • is the denotes the Hadamard (element wise) product, implying that r,, is
an n X 1 vector with elements rf. Combining these we can write
1
A' =
D{r)X ^r
2ct2 ' sq
and
■ F^2a^
^D{r)X
+
4a'
n
(XX')"
T^X'Dir)
(2.15)
a2
D{r)HD{r) +
2na' ''"^
2.5.2 Directed Influence on a"^
The acceleration matrix for the profile likelihood displacement for a^ is
simply the second term of (2.15):
Solving the characteristic equations directly, the single eigensolution is
(2.16)
"^max ^ ^ sq
34
This suggests that o^ is most affected by variance perturbations when they are
scaled according to the squared residuals.
2.5.3 Directed influence on yg,
Without loss of generality, consider directed influence on ^i. Again we will
proceed directly to determining the sole eigenvector proportional to A'T and the
corresponding curvature 2A'Tp as per (2.3) and (2.9):
Xi X(i)
a
1
7i
a T\
= r>(r)ri
(Xr Ty
(2.17)
and
t.'max — •^
1 n ^ 112 0„ 11^" ''I
r • Ti = zn
^ n
r2ri2
(2.18)
This suggests that maximum local influence of the perturbation on $i is
achieved by perturbing cases that have both large residuals and extreme values for
the adjusted covariate. Cook (1986) provided additional details of how this result
compares to the information in a detrended added variable plot.
Also, we may compare the information in Umax with that provided by the
DEBET AS diagnostic (Belsley et al. 1980). Recall that the value of DEBETASij
is the standardized diflference between $j and /3j,(i), the estimate when deleting the
i*'' observation. The formula is given by
DEBET ASij =
^ij'i
^Jv^)
s\\cj\\{\hii)
35
where r^ is the i*'' residual, Cj is the f" column of X{X'X)\ Cij is the i^^ element
of Cj, and s^ = ^^^ is the mean squarederror estimate of cr^. The vector Cj can be
written in terms of the /'' adjusted regressor by partitioning (X'X)"^ as per (2.7).
For i = 1,
Ci =
Xi X
(1)
ri
1
7i
'•i
jri.
Using this equality, the vector of n DEBET AS for /3j is proportional to
/
1
\
iftii
1
ift
22
r • Ti
(2.19)
\ i/i„„ /
Since Umax oc r • rj, observations with large leverage stand out more in a plot of the
n DEBET AS for /3j than in a plot of Vmax This algebraic relationship between
Vmax and DEBET AS has not been noted by previous authors.
2.5.4 Directed influence on ^
The acceleration matrix for directed influence on y3 is the first term of (2.15):
2
F^^ =
[D{r)HD{r)].
The eigensystem of the matrix does not have a closedform. However, some idea of
the directed influence of this perturbation on ^ can be ascertained by utilizing basic
perturbations. The i^^ basic perturbation corresponds to modifying the variance of
only the i"* case, and its curvature is the i^^ diagonal element of F (Cook 1986):
2
(2.20)
1
0.75
0.5
0.25
•0.25
■0.5
0.75
•1
• J AT •!> • 2(f S" 30 ^
(a) V2 for 6
36
1
0.75
0.5
0.25
■0.25
■0.5
■0.75
•1
* «» » ■
•n
**t.
5 itl "IS 20 ••25 ' ^ 35
• * • •
(b) vz for
Figure 2.1: Hills data; variance perturbation
This is similar to the i*^ Cook's Distance (Cook 1977):
. • V 1
k{\  hiifs
2.5.5 Influence on
The eigensystem of F is not closed form, but it can be found numerically.
As an example, consider the perturbation of the error variances for the Scottish
hills data. There are three nonzero eigenvalues for F: 14.82, 4.91, and 0.37. The
eigenvector associated with the largest eigenvalue was plotted in Figure 1.10(a). It
is similar to v^aax for directed influence on o^, as given in Figure 1.10(b). The other
two eigenvectors for F are plotted in Figure 2.1. : "
2.6 Benchmarks for Perturbation Analyses •
An analyst using local influence diagnostics will want to know when a
curvature is "too big" and when an element of Umax is "too big." Calibration of local
influence diagnostics was the topic of much debate in the discussion and rejoinder
to Cook (1986), and there is still no consensus opinion on the subject. Much of the
difficulty lies in the fact that the perturbation is introduced into the problem by the
37
analyst. Because the technique does not involve hypothesis testing, critical values
are not appropriate. However, there have been attempts to develop ruleofthumb
benchmarks, as exist for Cook's Distance and other common influence diagnostics.
In this section, we discuss some general difficulties in developing benchmarks
for perturbation analyses, as well as some issues specific to the local influence
approach. Although it is argued that all sensitivity is "relative", some simple
methods for assessing the size of curvatures and the size of the elements of directional
vectors are presented. , ,,
2.6.1 The Analyst's Dilemma . , ■,, '*
As mentioned before, the perturbation is introduced by the analyst, and this
implies that he/she must decide whether the effects of perturbation are important,
much the way a researcher decides what effect size is important. Consider a situation
in which the researcher is actually able to give the size or stochastic behavior of
the perturbation. Here, we may concretely measure the effects of perturbation
by whether or not the conclusion of a hypothesis test changes. Alternatively, the
effects can be measured by whether or not the perturbed estimates are outside an
unperturbed confidence ellipsoid. However, despite having a concrete measure of the
perturbations's effects, assessing the practical relevance of the results still falls to
the analyst.
2.6.2 Calibrating Curvatures in Local Influence
In addition to the general limitations of a perturbation analysis, an analyst
using the local influence approach must also contend with the fact that the behavior
and size of the perturbations are undetermined apart from the basic formulation.
Loynes (1986) and Beckman et al. (1987) noted that curvatures do not have
the invariance properties needed for a catchall method of benchmarking. This
necessitates a reliance upon relative influence.
• :. „ ; 38
2.6.2.1 1tol coordinate changes in fl
Following Loynes (1986) and Beckman et al. (1987), let us consider 1tol
coordinate changes in the perturbation space ft. That is, a new perturbation scheme
is constructed with elements u* = k{LOi) ior i = 1, ... q, for a smooth function k. The
new scheme's perturbation, a;*, will not yield the same curvature as uj. Specifically,
Coi* = (^) _ C^y where uiig denotes the null perturbation value for a;,
^duii
U=Ui,
As an example, consider perturbing the variance of j/j to be 5? rather than rr
i •
Here, k{uji) = uof, resulting in new curvatures that are (^l^i^i)^ = (2a;i)^.^i = 4
times the original ones. Hence, despite the fact that the two versions of fl can
produce the same set of perturbed models, the curvature in identical directions will
change. This implies that a universal benchmark for curvatures is not possible.
Indeed, given the arbitrariness of scaling the perturbation, it should come £is no
surprise that curvatures cannot be compared across perturbation schemes. However,
the direction of maximum curvature is unchanged under 1tol coordinate changes
in fl. ' ■ ,■
2.6.2.2 Interpreting curvatures • %
In this section, a simple way to assess curvatures from the same perturbation
scheme is given. First, yC is a Taylor series approximation to LD{u)q + av)
(Lawrance 1991; Escobar and Meeker 1992). Also, assuming that n is large, an
asymptotic 100(1 — q)% likelihoodbased confidence region for is given by values
oi such that
2{L{9y)L{9y))<xl,^.
Thus, if twice the curvature in direction v is larger than Xp.io? then a perturbation
at a distance of one from Wq in direction v moves the parameter estimates to the
edge of the asymptotic confidence region.
" '.: ' •  ■ :*■ .^ 39
To further develop this idea, let us define the size of a perturbation to be
its Euclidean distance from the null perturbation (i.e., aj — ci>o). Suppose further
that we have perturbations in two different directions: u;^ in direction va and ujb
in direction ug. If C^ is fc times larger than \Cb\, then perturbation wg must be
s/k times larger than u;^ to have the same effect on the likelihood displacement.
Equivalently, in order to perturb the estimates to the edge of the confidence ellipsoid,
it would be necessary to travel \/k times farther in direction vg in ft as in direction
v^. This provides a simple interpretation for the relative size of two curvatures.
2.6.2.3 Internal benchmarks for curvatures
Building on the interpretation given above, curvatures can be benchmarked
based on their relative size to some sort of "average" curvature for a given scheme
and fitted model. For instance, the average of the q basic curvatures for the fitted
model can serve as a benchmark. Alternatively, the average curvature could be
computed by (1) allowing the squared elements of v to have a Dirichlet(ai . . .ag)
distribution with all oij equal to one, and then (2) finding the expected curvature.
This would be equivalent to finding the average approximate displacement at a
radius of \/2 from Wq (Cook 1986). If such a benchmark is difficult to compute, a
Monte Carlo estimate can be calculated. Because these benchmarks are computed
using curvatures from the fitted model, they result in comparisons that are internal
to the fitted model. The conformal normal curvature of Poon and Poon (1999)
would also be an internal way to assess the sizes of curvatures.
2 .6.2 .4 External benchmarks for curvatures
It is entirely possible that all of the curvatures of a fitted model are large
due to the nature of the data set, and using internal benchmarks will not alert
the researcher to such a phenomenon. To address this, a benchmark that is not
dependent on the observed data is needed. To this end, we might utilize the
expected value of the curvature in direction v, using in place of the unknown
40
parameters 0. The expected curvatures can then be plotted along with the observed
curvatures (Cadigan and Farrell 1999). If the expectation of Cmax is difficult to
obtain, Monte Carlo simulations could be used to estimate it. As an alternative,
Schall and Dunne (1992) developed a means of external benchmarking by developing
a scaled curvature using results on parameter orthogonality (Cox and Reid 1987).
2.6.2.5 Internal versus external benchmarks
Internal and external benchmarks are complimentary tools, since each one
provides different diagnostic information. Comparing curvatures from the same
fitted model highlights relative sensitivity given the data and the design. Comparing
curvatures to their expectations highlights whether the sensitivities are expected
given the experimental or sampling design.
2.6.3 Calibration of Directional Vectors
For a directional vector from the eigensystem of F, the analyst must compare
the individual elements in the vector to determine which aspects of the perturbation
are most important. Typically, it is suggested that the analyst simply look for
elements that stand out as being large relative to the rest. Alternately, a vector
whose elements are all equal in size can serve as a means of comparison. That is to
say, if each of the q elements of the perturbation are contributing equally, then each
is equal to either i= or — ^. A particular element whose absolute value is larger
than, say, 2 or 3 times that of equally contributing elements (e.c.e.) would then be
of interest. Hence, the inclusion of horizontal lines at ±^ and ±^ in the plot of
directional vectors may be useful.
■ 2.7 Local Assessment Under Reparameterizations '
Pan et al. (1997) show that Vmax is invariant to any 1tol measureable
transformation of the parameter space 0. In fact, for = h{(f>), the entire
eigensystem of F^ is the same as that of JFg. However, directed influence on the
41
estimates of the redefined parameters can then be examined. Using similar steps to
Pan et al. (1997) and Escobar and Meeker (1992), we can write update formulas
for A^ and L^ and then calculate directed influence on a subset of the redefined
parameter estimates. Using partial diflFerentiation, the building blocks of the
acceleration matrix under the parameterization </> can be written:
de'.. de ,
(2.21)
(2.22)
where all partial derivatives are evaluated at the fitted model. From this, the
acceleration matrix for directed influence on <p^ can be obtained from the partition
</> = (</>i,</»2)':
,[</>!
2A;(L
,00
1
^022 )^<^
= 2A
where
dc/)
= 2A^(L, ^B,,,)A„
(2.23)
il^
?>22
Zy^22 comes from the partition
t'011 L
4>12
^4>2\ ^4>22
and all partial derivatives are again evaluated at the fitted model.
Although Pan et al. (1997) suggested that (2.23) is invariant to the choice
of 027 they did not provide a proof. To show this result, consider another
42
reparameterization for which the first pi elements are the same as those of (f>:
V'l
01
xj,=
=
V'2
^2
This implies that
dtj)'
Ipi
d<f>2 9(p2
dxl>[ 9^2
(2.24)
and this leads to the following expression for the lower right submatrix of L^:
8x1^2 dii)'2
d(f>' 89' ■■ 80 d(j>
av'2
8e' .. 80
8(f> "50'
d4>2
di>'2
(2.25)
_ d^dO^i 00 8(p2
8'4)2d(j)2 ^8(1)2 811^2
8x1,2 '^^^ 8xp'2'
where all partial derivatives are evaluated at the original model. Meanwhile, the
acceleration matrix for directed influence on i/j^ is
,[1/!
2a;(l; 
= 2a;(l,
)A,
L^
1p22
80
8x1)'
L
1
022
(2.26)
5^
8x1)
)Afi
The invariance of the acceleration matrix for directed influence on </>i to the choice
of 02 is proved by showing that (2.26) is equivalent to (2.23). Using the inverse of
43
(2.25),
80
dtj)'
L/..
80'
80 8(f)
8x1)
" a</)' 8^'
. ° ^^i .
d^d0[^
8x1^ 8(j)
[pi
L^l
T M
'pi avi
n ^
n ^02 rl 902
9(9'
L^
022
8(f>'
(2.27)
50
_ 80
90
~ 8(t>'
_ 80
Substituting (2.27) into (2.26) completes the result.
2.7.1 Directed Influence on a Linear Combination of Parameter Estimates
As mentioned in Escobar and Meeker (1992), a specific application of these
results is to consider directed influence on a linear combination of parameter
estimates, x0. There are many ways to reparameterize the model such that x'0 is
one of the new parameters. For example, assuming that the first element of x is
nonzero, </> can be defined as A0, where
xi a;(i)
Op_i Ipi
where Xi and a;(i) are the first and remaining elements of a;, respectively (Escobar
and Meeker 1992). Using equation (2.23), the acceleration matrix for directed
influence on ^i = x'0 can be computed, and the resulting diagnostics are invariant
to the choice of (/>2, . . . , 0p. By using a prudent choice of new parameters, we can
obtain easytocompute formulas for the single eigensolution of F , as per the
following theorem.
Theorem 2.1
The maximum curvature and its direction for directed influence on x'O is given by
where ki = Lg x.
Proof of Theorem 2.1
•• 1/2
To begin, consider a model with parameterization (j) — K'Lg 0, where
K
is a full rank p x p matrix with
Ki K2 ■ ■ • Kr
ki = Lg X
> Kkj= iy^j
k[ki = k[ki i = 2. . .p.
By construction,
Using the update formulas in (2.21) and (2.22) it follows that
A 9^' A
iifciir
"''■W'
45
Next, we show that directed influence on ^i = x'0 is not a function oi k2, ■ . ■ , kp.
From equation (2.23), we find
\
f''' = 2(,
1
1/2
Ifell
A'.Le' K) fci^Jfex
0'
I
J/
(y^^'^^"'^^^^
' :A'L'^'k
IfcllP " '
1 0'
K'T'^'Ae
^^A'eCk,k[L"'Ae
I'^i II
A'flLfl xx'La Afl.
Ifel
(2.28)
The single eigensolution of this rankone matrix yields the final solution for Cmax
and Umax □ , ■
The construction of is worth some note. The premultiplication of by
•■ 1/2
K'Lg orthogonalizes the parameters, in the sense that the offdiagonal elements
of the observed information matrix for </) are zero. (Note that the orthogonalization
is with respect to the observed information matrix as in Tawn (1987), rather
•• 1/2
than the expected information as in Cox and Reid (1987).) First, a = Lg
is a parameterization with an observed information matrix equal to the identity
matrix. Then, K is constructed to make orthogonal contrasts of a, with the first
column corresponding to the linear combination of interest. Finally, by scaling the
columns of K to all have the same length as fei, K becomes a scalar multiple of
an orthonormal matrix, and this leads to the algebraic simplifications. Also, if
x'6 = 9j the local influence diagnostics provided by the theorem are the same as
those provided by (2.3), as per Beckman et al. (1987).
46
2.7.2 Application to Regression
As an example, consider perturbing the variances of a regression model and
examining influence on a linear combination of the regression parameter estimates,
fi{x) = x'yS. From (2.28), we find
^ D{r)X{X'X)^xx'{X'Xy^X'D{r),
(2.29)
where K = x'{X'X)~'^x . Further,
v^^(xx'{X'X)'^X'D{r)(x Cov[X'fi,x'p) ■ r (2.30)
. C„,a, = 4l^'(^'^)''^'^(')ir. • " (231)
where Cov(X^, x'yS) is the vector of empirical covariances between X0 and x'/9.
2.7.2.1 A DFFITStype measure
Using the variance perturbation and directed influence on x 0, a local
influence analog of DFFITS can be defined. Recall that DFFITSj is defined as the
externally studentized change in the i^^ fitted value when deleting the i*'* observation
(Belsley et al. 1980):
DFFITS. = ^^^1^^£^,
where fl{xi)[i] is the estimate of //(xj) with the i*^ observation deleted and sh is the
mean squarederror estimate of a^ with the i*'' observation deleted. A large value
for DFFITS indicates that the observation is somewhat "selfpredicting" . It can be
shown that DFFITS, simplifies to
47
A similar measure can be obtained by examining local influence on
p,{Xi) = a;3 when using the i*^ basic variance perturbation. That is, only the
i*'* observation is reweighted, and the curvature for directed influence on /i(cc,) is
examined. The curvature for the i*'' basic perturbation is the i*'* diagonal element of
Cft. = BasicFITSi = ^^. (2.32)
A large value of BasicFITS indicates that the perturbation of the observation's
precision/weighting greatly influences its fitted value. Thus it is a local influence
analog of DFFITS. Coincidently, BasicFITSj is equivalent to the i^^ basic curvature
for the acceleration matrix, as given in (2.20). That implies that for the variance
perturbation, the n basic curvatures of LD{u>) are measuring influence on the n
fitted values.
While BasicFITSj is the f*'' basic curvature when directing influence on //(xj),
we can also define MaxFITSi to be C^ax for directed influence on fi{xi). Using
(2.31), . .
2 "
C:nax. = MaxFITSj = T^ J] ^rJ. (2.33)
By definition the, i*'' MaxFITS can be no smaller than the i^^ BasicFITS. While
BasicFITS provides information about which observation's perturbation influences
its fitted value, MaxFITS provides information about how sensitive each fitted value
is relative to the others. •
Example
Consider examining influence on the fitted values for each of the 35 covariate
profiles in the Scottish hills data set. Figure 2.2(a) displays the 35 BasicFITS (i.e.,
each of the basic curvatures for directed influence on the corresponding fi{xi)) .
Race 7 is indicated as being influential on its fitted value, while race 18 is also shown
48
5 10 15 20 25 30 35
(a) BasicFITS
• ' • • •
5 10 15 20 25 30 35
(b) MaxFITS
I * ■' >
t : '• ;
I ,1
5 10 15 20 25 30 35
.: , .. (c) BasicFITS (stars connected by
solid line) and MaxFITS (triangles
connected by dashed line)
Figure 2.2: Hills data; cr^D(l/u;) perturbation; influence on fitted values
to be quite influential on its fitted value. Plot (b) shows the 35 MaxFITS (i.e. the
C'max's for directed influence on each of the 35 fi{xi)). Since no point stands out
as being large, the plot indicates that no fitted value is particularly sensitivety to
variance perturbations. Plot (c) shows both sets of diagnostics. The BasicFITS
values are stars joined by a solid line, while the MaxFITS values are triangles joined
by a dashed line. The closeness of the MaxFITS and BasicFITS values for race 7
indicates that this observation is somewhat selfpredicting.
..t.V H l...
d'
CHAPTERS
PERTURBATIONS TO THE X MATRIX IN REGRESSION
In this chapter, the influence that infinitesimal errors in the X matrix can
have on parameter estimates in a linear regression is addressed. First, previous
results for single column and single row perturbations are reviewed. This is followed
by a generalization of these results to perturbing all elements in the X matrix. It is
shown that the eigensolution of F is related to principal components regression,
collinearity, and outliers. Finally, new graphical techniques are presented with the
results applied to the Scottish hills data set and an educational outcomes data set.
3.1 Perturbations to Explanatory Variables
Letting k denote the number of predictors. Cook (1986) proposed that an
nA;sized perturbation can be incorporated into the design matrix in an additive
fashion. These perturbations represent fixed errors in the regressors and should
not be confused with random errorsin variables techniques (Fuller 1987). The
perturbation scheme can be used to assess the effects of small rounding or truncation
errors in the explanatory variables. Upon perturbation, the X matrix becomes
X +
UJi
W„+l
^2n+l ■
• • ^knn+1
UJ2
^n+2
...
■ ■ ljJknn+2
Us
■ ■ ^knn+3
OOn
i^2n
W3n
■■ OJkn
The use of a single subscript implies w = vec(W^), where the vec operator stacks the
columns of the matrix into a single column. When particular elements of W greatly
aflPect parameter estimation, it indicates that the corresponding element of X is
49
'^^C 50
influential. This allows the analyst to identify specific combinations of explanatory
variable values that influence estimation. As will be shown in Section 3.4, sensitivity
of the regression parameter estimates to these kinds of perturbations is greater when
there is coUinearity amongst the regressors and when the model is calibrated to fit
the observed data too closely.
3.2 Perturbations to a Single Column
The special case of only perturbing a single column of X was discussed
in Cook (1986), Thomas and Cook (1989), and Cook and Weisberg (1991). This
section recreates their results in detail.
3.2.1 Building Blocks
Here and in the following sections, di represents a vector with a 1 in the «'*''
position and zeros elsewhere. The length of the vector depends on the context of its
use. Note that for di of size k,
d[{X'Xr'di = rr^ (3.1)
1 1 ' 1 1 1
and
X{X'X)'di = r^ri, :' (3.2)
Ir ill ..
where r, contains the residuals from regressing Xi on the other columns of X.
Without loss of generality, suppose perturbations are added to the n values
of the first predictor (i.e., Xi becomes Xi +u}). The perturbed likelihood is
n I
K{9; y) oc  loga^  {y Xf3 Aa;)'(y X/3 ^u;)
 ^ ^^ (3.3)
= ~ loga^  —{y'y  2My + ^V^).
51
The next step is the construction of A. Partial derivatives with respect to u) are as
follows:
Second partial derivatives are then
d'LUe;y) ^ A
d'LUe;y) 1
(yX(i)/3(i)2A(A:i+a;))
X(i)
duda^
— (yX0/3,u:).
Evaluating at the original model yields
.. du}d/3[
d^LU0;y)
^{r^X^)
d'K{0;y)
0=B,u:=u:o
. ' duda"^
and from these quantities we construct the n x p matrix
1
A' =
1
(3.4)
r  A^i AX:(i)
rd[  AX
where di is a A; x 1 vector. It follows that the acceleration matrix is given by
rd[PiX kr
a^ {XX'y'^
0'
Ml
n
dir'  Z^iX'
^r'
/T'*
(3.5)
J^mrr'.PlHJl^(rr',^ry)
52
3.2.2 Directed Influence on a^
The acceleration matrix for the profile likelihood displacement for a^ is given
by
n
= — jrr .
(3.6)
(3.7)
Solving the characteristic equations {F  XI)v = directly, the single
eigensolution is
C„
4/3?
l^max 0^ ' •
Therefore the direction of maximum curvature for directed influence on a'^ is always
proportional to the residuals regardless of which column of X is perturbed, but the
corresponding curvature is a dependent upon the effect size, \$i\.
3.2.3 Directed Influence on /9i
Using the results of Beckman et al. (1987), the sole eigenvector is proportional
to A'T and the corresponding curvature is 2A'Tp as per (2.3):
'^max CX . 
rAXi AX
(1)
n
1
7i
1
(^\\ri\
1
crri
oc r — /3iri
r/3iXi+/5i7iXi
r  /3iri
, (3.8)
53
and
'^max —
mr,\\^
*9II 21
= l[^'t^
(3.9)
o^ \ \\ri\\
The direction of maximum curvature is a weighted average of the residuals and the
adjusted regressor. As noted in Cook and Weisberg (1991), ^iVi — y — y^i), where
y(i) is the vector of fitted values when Xi is excluded from the model. Using this
result, we find Vmax oc 2r — r^^, where r^j is the vector of residuals when modeling
y using all regressors except Xi. This indicates that the direction is dominated by
the residuals, a fact which is demonstrated by examples later in this chapter. Cook
and Weisberg (1991) discussed how Umax relates to detrended added variable plots.
3.2.4 Directed Influence on Other Regression Parameter Estimates
Thomas and Cook (1989) examined the influence that perturbations in one
column of X can have on the regression parameter estimate corresponding to a
different column. Without loss of generality, suppose we perturb the first column of
X and look at the effects on the last regression parameter estimate ^j.. Again using
the results of Beckman et al. (1987) we can obtain the maximum curvature and its
direction. Letting di be of size /: — 1,
v^ax OC A'T = J, (rd[  /3iX(fc)) ^X^
Fit
= jrrrr; [rd[% + kH(k)Xk  PiXk
Ik
1
(3.10)
54
where 7fci is the regression parameter estimate corresponding to Xi when predicting
Xk with the other columns of the X matrix. The maximum curvature is given by
2
a
Ikir' + ^ir'k
lk\r + /3irfc
2 ^02 , 2 ll^l
■y, —
/5i^ + 7.V
(3.11)
When the partial correlation between Xi and X}. is zero, Cmax achieves its minimum
value of ^, yielding Vmax ex: r^. In the event that Xk is a linear combination of the
other regressors with nonzero 7^1, Cmax = 00 and Umax ex: r. Between these two
extemes, the direction of maximum curvature is a weighted average of the residuals
and the adjusted k*^ regressor. These quantites are used to detect outliers and
measure leverage, respectively. If the partial correlation between Xi and X^ is as
large as that between Xi and y, then the two vectors are equally weighted. Note
that this weighting is invariant to scale changes in the columns of the design matrix
as well as the response.
3.2.4.1 Directed influence on
Without loss of generality, we again consider perturbing the first column.
The rankA; acceleration matrix is
F^' = —
VlFilr/ Irilr
(3.12)
The first eigenvalue of this matrix is unique, and yields a maximum curvature and
corresponding direction of
Cmax = J2 [$i + ^) and viocr ^^r^ .
Hence, the maximum curvature and its direction are identical to that of directed
influence on /5i. Confirmation of this first eigensolution follows:
(A'lkiir + rni(r/3iri)
= rV  AlkllVi  ^f llnll^if n + /32riV
/5ilnVrV + /3i^riVx+ArVi
= ^i^riplfri+/5friVi ,,
= 0.
The second eigenvalue is ^/5f and has multiplicity A; — 1. As mentioned earlier,
this implies that there is no unique set of orthogonal eigenvectors to complete the
solution.
3.2.5 Influence on
The acceleration matrix for examining both ^ and a^ simulatenously has
k + 1 = p nonzero eigenvalues. The form of the eigensolution is unknown. My
numerical studies indicate that there are three unique nonzero eigenvalues, one of
which has multiplicity k  I. The one with higher multiplicity appears to be the
second largest of the three, with a value of ^^l
Example
Huynh (1982) and Fung and Kwan (1997) examined data from 20 randomly
sampled schools from the MidAtlantic and New England States. The mean verbal
test score (VSCORE) of each school's 6th graders is regressed on staflP salaries per
pupil (SALARY), % of whitecollar fathers (WCOLR), a socioeconomic composite
score (SES), mean teacher's verbal score (TSCORE), and mean mother's educational
56
Table 3.1: School data; Pearson correlation coefficients
SALARY
WCOLR
SES
TSCORE
MOMED
VSCORE
SALARY
1.00
WCOLR
0.18
1.00
SES
0.22
0.82
1.00
TSCORE
0.50
0.05
0.18
1.00
MOMED
0.19
0.92
0.81
0.12
1.00
VSCORE
0.19
0.75
0.92
0.33
0.73
1.00
level (MOMED). Appendix A contains model fitting information and diagnostics for
the regession analysis, with all variables centered and scaled.
Diagnostic plots of the residuals, externally studentized residuals, Cook's
Distances, diagonal elements of the hat matrix, and DEBET AS appear in Figure
3.1. Ordinary least squares regression indicates cases 3 and 18 are outliers, although
Huynh (1982) also identified observation 11 as a mild outlier. This data set also
has some redundancy among the predictors SES, WCOLR and MOMED (Table
3.1). Although the X matrix is not illconditioned, the association between SES
and MOMED accounts for the unexpected negative slope estimate for MOMED.
Table 3.2 shows the nonzero curvatures (eigenvalues) for perturbing
single columns of the X matrix. Comparing the columns of the table provides
information on which regressor's measurement is most influential. Here we see
that the measurement of SES is typically the most influential, both for regression
parameter estimates and for a"^. Comparing the rows provides information on which
parameter estimates are most influenced by these kinds of perturbations. Here we
see that WCOLR and MOMED estimates are the most sensitive. These are the two
regressors that are highly correlated with each other and with SES. It comes as no
surprise that the results for collinear explanatory variables are more susceptable to
measurement errors of this nature.
57
•
«
M
o.:s
•
• •
•
•
•
•
. .i'
• 10
• 15 20
■u
•
•
•
■o.s
t
• • 5 • 10 'IS 20
.AM *■■■
(a) Residuals
(b) Externally stu
dentized residuals
(c) Cook's Dis
tcinces
t.t
1.3
t,i
0.1
.A. ■
1.1
11 IS :o
*
.,,^
' ' ■ I ' '"« ' ' t. .'ft
t ■ ■ • — • — »» — * * ■ ■ — "^
(d) Hat diagonals
(e) DFBETAS for
SALARY
(f) DFBETAS for
WCOLR
1.S
1
i U 15 'JO
r* — ■ I ■ ■ »
VT
* — I ' > . * — * — « » ■ ■ I
(g) DFBETAS for
SES
(h) DFBETAS
for TSCORE
(i) DFBETAS for
MOMED
Figure 3.1: School data; diagnostic plots
58
Table 3.2: School data; curvatures using Xi + u perturbation
Estimates Column being perturbed
in LD{u})
ovLD^^'^iu)) SALARY WCOLR SES TSCORE MOMED
e Ai=4.144 Ai=20.156 Ai=50.008 Ai=6.658 Ai=19.065
: A2=0.440 A2=0.847 A2=19.002 A2=1.413 A2=00.932
A3=0.094 A3=0.071 A3=14.441 A3=0.599 A3=00.091
$ Ai=3.357 Ai=18.534 Ai=26.445 Ai=4.432 Ai=17.292
A2=0.440 A2=0.847 A2=19.002 A2=1.413 A2=00.932
A
3.357
0.956
19.008
2.126
0.951
A
0.458
18.534
19.890
1.585
10.869
A
0.443
2.957
26.445
1.501
1.640
?A
1.129
1.859
19.220
4.432
1.295
A
A
, 0.444
11.590
19.324
1.480
17.292
^
0.881
1.693
38.004
2.825
1.865
Since perturbing the SES variable is most influential, examining directional
vectors for this perturbation is of interest. The element corresponding to case 18 is
the most noticeable in the plot of Vmax for the likelihood displacement (Figure 3.2).
This observation has a large residual and is indicated as being influential in the
various diagnostic plots in Figure 3.1. Also note the similarity of Figure 3.2 to the
residuals (Figure 3.1(a)).
Plots (a)(e) of Figure 3.3 show the directions of maximum curvature when
directing influence on each of the individual regression parameter estimates. In
general the plots indicate diff'erent elements of X3 to be influential on different $i.
Finally, Figure 3.3(f) provides the direction of maximum curvature when directing
influence on a"^. Recall that this vector is proportional to the residuals.
59
1
0.75
•
0.5
r
0.25
f
•
t •
•
•
•
0.25
•
• 5 • •
•
•
'ft
. 2°
•0.5
•
0.75
■1
Figure 3.2: School data; X3 + u; perturbation; influence on
3.3 Perturbations to a Single Row
The special case of only perturbing a single row of X was discussed in
Thomas and Cook (1989) in generalized linear models. Here, detailed derivations
for regression are presented. The results indicate that single row perturbations have
limited diagnostic value.  .
3.3.1 Building Blocks
Without loss of generality, suppose perturbations are added to the k regressor
values of the first observation (i.e., x\ becomes x\ + a;'). For di being of order n,
the perturbed likelihood is
K[e y) oc ^ loga^  ^(2/  X/3  d,u:'P)'{y  X/3  d.JH)
k k
= ^loga^  L(y'y  2y,(^^,a;,) + (^^,a;,f).
(3.13)
j=i
j=i
Partial derivatives with respect to w are as follows:
^^ = ^(2,.,.2(E/....)/3)
(3.14)
<•■ >.>«s
60
1
0.75
0.5
0.25
■0.25
■0.5
■0.75
■1
< . «
5 *10 * A 20
« •
(a) Vmax for /3l
1
0.75
0.5
1
0.25
•
••••.,•
', ••
5 10
15 • •20
0.25
0.5
•
• .
•
•
0.75
•1
(b) Umax for /S2
1
0.75
0.5
0.25
•0.25
■0.5
■0.75
1
• • • •
. 'S. • V . " •
1
0.75
0.5
0.25
(c) Umax for ^3
■0.25
0.5
0.75
■1
• •
• * 5 . • 10 i • ,A
(d) Vmax for Pi
1
1
0.75
0,75
«
0.5
I
0.5
0.25
• .*•• • .••.
t
•
0.25
• •/ '.
•
5 , V 15
V
• • 5 ' • 10
,'15 20
0.25
•
•
■0.25
•
t •
•0.5
• •
■0.5
I
••.75
■0.75
4
(e) t^max for /Ss
•1
(f) t^max for
a^
Figure 3.3: School data; X^ + w perturbation; directed influence plots
61
Since the null perturbation is 0, evaluating at Wq simplifies the expression. We then
proceed to take second partial derivatives:
/
d^L^{e;y)
duid^i
UJ=U}o
LV
^
2/1  x[l3
xu^
1
d'L^{e;y)
duda^
2/1  x[^
a^
^.
u;=u;o
Evaluating at = and combining the portions for the regression parameter
estimates we obtain . .
d^L^{0;y)
dud^'
1
ril  0x[
dujda'^
ri
i^'
0=0,U=LJo ^
where ri is the first residual. From these quantities we can construct the A; x p
matrix
A'
rj^x[ ^0
(3.15)
The acceleration matrix is then given by
F = 2 —
a" [XX'Y^
0'
^n + ^ h3/9 + r\{X'XY^  ri^x[{X'X)'  n{X'X)'xi^'
2£l
n
ril xi$
62
3.3.2 Directed Influence on o^
The acceleration matrix for the profile likelihood displacement for a is given
by
S3
;;.4n
2(7
n
S3'
\r\
no^
(3.16)
/9/9
Solving the characteristic equations (F — \I)v directly, the single eigensolution is
4rf
^max = S/3
Un
OC^.
Notice how the curvature depends upon the residual for the case that is being
perturbed, but the directional vector is always proportional to /9 regardless of which
row of X is being perturbed. This suggests that Umax is not very useful for single
row perturbations. The values of Cmax for each row can be compared to assess which
row is most influential when perturbed. However, the relative effects only depend
upon the residuals, indicating there is little value added by plotting the curvatures.
63
3.3.3 Directed Influence on /3j
Without loss of generality, we can direct influence on 0^:
Umax oc A'T
oc
ri Jfe  0x[
1
7i
Xn0
O'ki
 xnl3  ri
Oki / \ 7i
n \ j + {xu  xn)0
7i
rilki
\
^i.)
1
7i
+ :rii/3
/
and
""'"T^llnl
[1 7l)' + (iua:n)3r,
where Xu is the predicted value of Xu from modeling Xi as a linear combination
of the other regressors. More generally, the maximum local influence on $j by
perturbing row i is '
'^max — fi
^max — /^ >
I \^ij XijIP
yj
(1 V.)'+(£,^.a:,,.)^2.
If the columns of X have all been standardized, then each of the elements of 7, and
of /3 is less than 1. This suggests that element i of Vmax is likely to be larger than
the other elements. However, the signs of r^ and {xij — Xij)$j can work to cancel
each other out. Further interpretation is suppressed in lieu of the more general
64
Cnax
35
30
25
2«f
15
10
Si
• • •
10
15
20
Row
Figure 3.4: School data; maximum curvatures for single row perturbations
perturbation presented in Section 3.4. Note that the curvature and directional
vector change if one or more of the columns of X are rescaled.
3.3.4 Influence on fi and 6
The acceleration matrix for directed influence on ;9 is
F^^^ = ^ [hiM' + rliX'X)'  n0x[{X'X)'  r,iX'X)'xi0'j . (3.17)
The eigensystem of this matrix is unknown, but my numerical examples indicate
that it has k distinct nonzero eigenvalues. The eigensystem of F for the full set of
parameter estimates is also unknown.
3.3.5 Example
The values of school 18's regressors are most influential when perturbing rows
of X one at a time, as evidenced by a large maximum curvature (Figure 3.4). For
perturbing row 18, v^^x is nearly proportional to ^ (Figure 3.5). Note that although
element 3 of Vmax appears quite large, it is actually less than the benchmark of
— = — — 8Q4
65
Ir
0.75
0.5
0.25
(I
0.25
0.5
■0.75
■1
Figure 3.5: School data; x[g + w', Umax for 6
3.4 Perturbations to the Entire Design Matrix
At this point, the general case of perturbing all nk elements of the design
matrix is considered. We use W to denote the perturbations in matrix form:
W
Ui
Wn+1
^2n+l ■
• • l^knn+1
a;2
'*^nf2
...
■ ■ ^knn+2
0^3
...
■ ■ l^knn+3
Wn
l^2n
i^Sn
■■ OJkn
and LOj to denote the f^ column of W. • .
3.4.1 Building Blocks
The derivations in this section only require the presence of an intercept in
the model (or centering of the response y). However, for ease of interpretation, we
assume the columns of X have been centered and scaled. The perturbed likelihood
is
L^iO; y)^^\oga'^{yX0 Wl3y{y X0 W/3). (3.18)
66
Proceeding in a similar fashion to the derivations for perturbing a single column,
partial derivatives with respect to u; are as follows:
dLUff;y) _ 1
du: <t2
/3,{yX0)P^W0
f3k{yX^)/3kW^
Since the null perturbation is 0, evaluating at Wq simplifies the expression:
dK{0y)
d(Aj
U}=(jiJo
I3i{yX0)
^2(2/  X0)
Pk{yX0)
We then proceed to take second partial derivatives and evaluate at 0:
(3.19)
(3.20)
d'LU9;y)
dujd/3'
1
0=0o,(j>J=U}o ^^
1
/
LV
r ...
Or ...
... r
\
^2X
\kx )
— {Ik®rl3®X)
d'K{0;y)
d(jjda'^
1
0=0o,U}=U}o ^'^
02r
a'
■(/9®r).
67
From these quantities we can construct the np x p matrix A', as given by Cook
(1986):
A' = ^
a^
Ik®rP®X
U^®r)
(3.21)
where ® denotes the Kronecker product (Searle 1982).
The next step is to construct the acceleration matrix. Here we use the fact
that {A ® B){C ® D) = AC ® BD, assuming comformabiHty of the matrices.
Noting that A = A®\ = 1®A, the acceleration matrix can be written as
F = — (jfc ® r  ^ ® X) {X'Xy^ (ik ® r'  /3' ® X') + ^ (^^' ® rr'^
= ^(^Ik®r$®X^ ((X'X)i ®r'0'® (X'X)^X')
(^^'®rr'y ;
(3.22)
na^
3.4.2 Directed Influence on q^
The acceleration matrix for the profile likelihood displacement for &^ is the
second term of (3.22):
(3.23)
This matrix has a single nonzero eigenvalue, ^/3p, with corresponding
eigenvector oc ®r. This solution is confirmed below:
{f"'^  XI)v a
iM'®rr'^\m'l
na
a'
i0®r)
na^
na^
^0 ®rr'\m''\\rfl\00r)
\m'0®\\r\\'r0f\\r\\'0®r)
(3.24)
= 0.
68
This result implies maximum local influence on a^ is achieved by perturbations in
each column that are proportional to the residuals scaled by the corresponding ^j.
3.4.3 Directed Influence on /3
In this section, we derive the full eigensystem of F , where
•• Iff]
The following theorem provides the full eigensystem of F for the X + W
perturbation in a regression analysis. Note that eigenvalues in the theorem appeared
in Cook (1986), but the eigenvectors appear here for the first time.
Theorem 3.1
Recall that (pj is the j*'^ eigenvector of X'X and that X(pj is the f^ principal
component. Letting Sj denote the j'^^ eigenvalue of X'X, the k nonzero eigenvalues
and eigenvectors of F are
\ = 2(t + ^) v^ ex ^fc_,.+i ® r  y3 X<p,_^+i, (3.25)
Ofcill o
for j = l,...,k.
Proof of Theorem 3.1 " ^ >", i^.
Because the algebraic expressions are so long, the eigensolutions are confirmed in
two steps. Noting that j^ — and Vfcj+i ^^"^ the /'* eigensolution of (X'X)~\ we
69
have
2 r
^ [(jfe ® r  ;9 ® X) ((X'X)i ® r'  /3' O (X'Xj^X')] (v?,. r)
= ^[(/.®r;3®x)(tE!<^,._o)]A(H! + ^2)(^.^^)
(3.26)
2 riM'2
^U^lV'.^r^^X^,.)
2 Jlr
C/ t/'j
■Pir(¥',®r)^(;9®X<^,.)
The following matrix multipUcation yields the same expression as (3.26):
F^^^2(^ + *)/j(/3^X^,)
= 4 [(jfe ® r  ^3 ® X) ((X'X)i r'  3' ® (X'X)^X')] (3 O Xip^)
= ^[(lfe®r^®x)(0/3Vj)
2
2
ll/3r(V,®r + (/3®X<^^.)
^(JJ^ + ^ir)(^®Xv.,)
y3p(VP,.®r)\^(3®X<^,.)
(3.27)
Using (3.26) and (3.27), confirmation of the /" eigensolution is immediate:
Ofcj+1 cr
(3.28)
= 0.
Finally, for j / j*, the orthogonality of the solutions follows:
{(Pj®r^®Xipj)'{ipj,®r^^X(fij,)
= 'p'j'Pj* ® r'r  3V;* ® 'PjX'r  cp'.0 ® r'X<p^, + 0^ (8> if'jX'Xipj,
= 000 + 3'/9® VjVj*
= 0. D
3.4.3.1 Interpreting Theorem 3.1
The closed form eigesolution in Theorem 3.1 gives insight into how slight
changes to the observed explanatory variables affect inference. To interpret the
theorem, consider maximum curvature, as per (3.25):
C„.ax = 2(^ + H!). (3.29)
Ok (^^
The eigenvalues of X'X are traditionally used to assess the amount of collinearity
in the regressors. If 5k is small, there is redundancy in the explanatory variables,
and this causes the inversion of X'X to be unstable. If this inversion is unstable,
so too are the regression parameter estimates, $ = {X'X)~^X'y. The first term in
Cmax is large when 6k is small, giving further evidence that the estimates are more
sensitive when collinearity is present. The second term of Cmax is also interesting.
It can be considered a naive measure of goodness of fit; when the relationships
between the explanatory variables and y are strong, )9p is large and a^ is small.
Hence, "better fitting" models are more sensitive to perturbation. However, )9p/6^
typically grows whenever additional variables are added to the model, indicating
more complicated models are also more sensitive to pertubation. In fact, /3p/(7^
bears some similarity to R"^:
SSy y'y n((T2 + X^2)
71
The quantity B? is the proportion of variance explained by the model, and serves
as a measure of goodnessoffit. However, B? can be made artifically large by
adding additional explanatory variables, even when they contribute very little to the
explanatory value of the model. The quantity /9^/(7^ behaves similarly: additional
variables decrease the denominator and usually increase the numerator. Of course,
the scale of the regressors is a major factor in the size of \\0\\^, indicating that
scaling the regressors would be prudent for interpretation.
So, we may interpret Cmax in the following way. The vector of parameter
estimates is more sensitive when (a) there are collinear variables, (b) there are a large
number of explanatory variables, and (c) the model closely fits the data. We are not
surprised to see that the first two phenomenon are sources of sensitivity. However,
the suggestion that betterfitting models are more sensitive to measurement error
may at first be disconcerting. When viewed in light of Type I and Type II errors,
this is not the case. Recall that a Type I error occurs when a test is wrongly
declared significant, while a Type II error occurs when a test is wrongly declared
insignificant. Generally, a Type I error is considered the worse of the two. So, if
there are strong relationships between X and y, the formula for Cmax implies this
might be missed because of measurement error in the regressors. However, if there
is a weak relationship, perturbations to X would not make the relationship appear
strong. In this sense, the regression model is better protected against Type I errors
(concluding there are strong relationships when there are none) than Type II errors
(concluding there are weak relationships when in fact there are strong ones) when
there is measurment error in the regressors.
As a final comment on Cmax, the term W^W^/a^ may be better described as
an indicator of a closely fitting model rather than a well fitting model. This suggests
that if we construct complicated models to fit the nuances of a data set, the resulting
72
parameter estimates are more sensitive to rounding and measurement errors in the
explanatory variables.
Now let us consider Umax as per Theorem 3.1:
Umax OC V'fc^r^^OXyJfc. (3.30)
•• r/?i
This vector and the other eigenvectors of F provide detailed information about
which elements of X most affect the stability of the MLEs. The expression for
Umax shows how outUcrs and leverage in the final principal component regressor
determine precisely what kinds of mismeasurements have large local influence on $.
The directional vector can be used to scale W to achieve maximum local influence:
yy max OC
ifkir  ^iX(pk ipk2r  $2X(pk "■ ... (pkkV  PkXipk
. (3.31)
The elements in the first column are a weighted average of the residuals (r) and the
regressor values of the last principal component regressor (Xy>^). The respective
weights are the first original regressor's contribution to the final principal component
{(Pki) and the negative of the first regression parameter estimate {—$i).
Alternatively, we can think of the elements in the first row as being a
weighted average of the explanatory variables' contributions to the final principal
component {(p'^) and the negative of the parameter estimates {0 ). The respective
weights are the first residual (ri) and the negative of the first observation's value for
the final principal component {—x[(p^.).
3.4.3.2 A special case
In order to examine in more detail how the perturbations given in (3.31)
affect $, we consider a simple situation for which there are closed forms for the
perturbed parameter estimates. Let us assume that the response and a pair of
regressors have been centered and scaled to have length one, and that the regressors
73
are orthogonal. The implications of these simplifications are as follows:
y =
\\Xi = X2I = 1
r'r =
l/5i^
/?!
A =
x'lV
$2 =
x'^y
5i =
5r = l
(p =
I2
The first two eigenvalues of F are the same:
and we may choose either of the following vectors for Vmax^
r  AXi
p2Xi
^1X2
r  /^2X2
Without loss of generality we choose the first one. Letting X^^ denote the perturbed
design matrix, we have
X^ = X+aW
= X + a
r  /3iXi /92X1
(3.32)
ar + (1  aPi)Xi X2  a$2Xi
Xi^u ^2,0;
for some real number a. Now, consider the effects of perturbation on /?i. Assuming
a nointercept model is still fit after perturbation, the perturbed estimate is given by
r'i,uy
Pl,.=
\ri,u
(3.33)
74
where ri,a, is the residuals from fitting Xi^^ as a function of X2,a; The building
blocks needed are as follows:
J^2,(j — ^2,w(X^2,t.;^2,w) ^2,,
1
(X2a/32Xi)(X'2a/32Xl)
1 + a2/9
ri,^ = (/  H2,<^)Xi,^ = (1  a/3i  acp2)Xi + 0X2 + ar
r[,^y = (1  a/^i  ac$2)Pi + 0^2 + a(l  $1  $1)
ri,J2 = (1  aA  ac/52)' + c^ + a'{l  Pf  4^),
where c = "^f't'l It follows that
A ^ (1  aA  QC;92)/gi + c^2 + q(1  /3f  /gp
^'■" (1  a$,  ac/92)2 + c2 + a2(l  pf  /5) '
For the sake of comparison, we can also consider perturbing only the first
column. Perturbation in the direction of Umax when directing effects on 0i results in
the following perturbed X matrix:
X^ = X + a*W*
= X + a*
r  /5iXi
(3.34)
a*r + {la*Pi)Xi X2
as per (3.8). Here, the perturbed estimate is given by
>:..: ^ ^ (la*A)/?i+a*(l/^?4')
(la*A)2 + a*2(l/52_42)
In order for the two perturbations to be comparable, they should be equal in size.
This is accomplished by stacking the columns of the perturbation matrices and
making the resulting vectors equal length. In other words, the Froebenius norms
of aW and a*W* should be equal. Letting a*
adjustment.
V^
provides the necessary
Now we can plot ^x^^ for some selected values of /Sj and $2, considering values
of a ranging from, say, — V^ to \/2. This eliminates perturbation matrices with
Froebenius norms that are larger than that of the unperturbed X matrix. Figure 3.6
shows plots of ^i^ij for four sets of parameter estimates. The solid lines represents the
parameter estimate in the direction Umax for the twocolumn perturbation, while the
dashed line represents parameter estimates in the direction Vmax for the onecolumn
perturbation. All of the influence curves in plots (a) and (b) have a value of .2 under
the null perturbation, i.e. when a = 0. Likewise, the curves in plots (c) and (d) are
all equal to .5 at the null perturbation. In all plots we see that the single column
perturbation does not have the same effect as the dual column perturbation. The
discrepancy between their effects is larger when /?2 is larger (plots (b) and (d)).
3.4.4 Influence in Principal Components Regression
Consider the canonical form of the model, as given in Chapter 1:
Y = Zoi + e, (3.35)
where Z = X(p and a = (fi'/3. The following theorem proves that the direction of
maximum curvature is the direction of maximumum directed influnce on Qjt.
Theorem 3.2
Let a. denote the MLEs of oc for model (3.3b). The pairs Xj, Vj given in Theorem
3.1 are the maximum curvature and direction of maximum curvature for directed
influence on cikj+i, j = 1, . . . , fc.
76
Bl
1.5
*;:
1
^
y^ '~,^^^
i
0.5
/*'
"■■..
•1
..■^■y
0.5
1
^
0.5
(a) Influence curves for /3i
/32 = .4
— .2, (b) Influence curves for $\
P2 = .l
= •2,
(c) Influence curves for Pi
02 = A
= .5, (d) Influence curves for Pi
02 = .7
= .5,
Figure 3.6: Effects of single and double column perturbations on ^i. Solid line
double column; dashed line column 1 only.
77
Proof of Theorem 3.2 We begin by obtaining update formulas for A and L as
per Section 2.7:
■;
'' a'=a/^
, ■ ■■"?>••■
V. ■ .: =A>^
= (^l^\X'X)<p)
^a''D{l/6).
(3.36)
(3.37)
78
Next, we find maximum directed influence on dfcj+i using the results of Beckman
et al. (1987):
Vmax OC KTa,^j+i
oc
(fi<S)r — (S) X(p
&
\/*fcj+i
oc (fikj+i ®rP® Xipk^j^i
Cmax — 2AQTa^__^._^J
^. (vLi+1 ® r'  ^' (g) v5'fc_,.+iX')(<^fc_,+i O r  3 ® Xcp^_^^,)
^r^— (1 ® r'r  \m' ^',_.^,X'X^,_^^,)
^=^<^fc
j+i
=2(^.m
4j
/+i
Since these expressions are the j*'' eigensolution of F \ the proof is complete. D
This theorem gives added insight into the direction of maximum curvature.
In short, v^^ is the direction that changes ay, the most. Similarly, the /'' largest
eigenvalue and associated eigenvector specifically correspond to eflfects on afci+i
Also, the sizes of the curvatures show that the estimated coefficients for the most
important principal component regressors are also the least sensitive.
79
Example 1
Consider the Scottish hills data set described in Chapter 1. The purpose of
this example is twofold to demonstrate how the Xmatrix results can be displayed
graphically, and to demonstrate how collinearity affects the curvature values when
examining influence on $.
Figure 3.7 gives the direction of maximum curvature for directed influence on
a^. The first 35 elements correspond to perturbations of the DISTANCE regressor
values, while elements 3670 correspond to perturbations of the CLIMB regressor
values. The value for race 18's DISTANCE is the largest contributer to this
influential perturbation. Elements 7 (DISTANCE for race 7) and 53 (CLIMB for
race 18) are also large
The information in Figure 3.7 can be difllicult to discern because the
perturbation vector enters into the likelihood in the form of a matrix, W. Another
way to display Umax is to first take its absolute value and then to consider the
elements corresponding to each column of W. Each portion can then be plotted
against indices 1, . . . ,n and shown together (Figure 3.8(a)). This graphical display
provides a "profile plot" of each column of W's contribution to the direction of
maximum curvature. Here we see again that elements corresponding to observations
18 and 7 as being the most influential. There is also some evidence that measurement
error in the DISTANCE values (solid line) are more influential than error in the
CLIMB values (dashed line).
Figures 3.8(b) and 3.8(b) provide profile plots of the absolute value of the
two directional vectors corresponding to Vi and V2 of F . The first plot draws
attention to observations 7, 11, and 18, while no observations appear noteworthy in
the second plot.
At this point we turn our attention to the curvature values that accompany
these directional vectors (Table 3.3). The first column shows the curvatures for local
80
Table 3.3: Hills data; curvatures using X + W perturbation. Model I contains
DISTANCE and CLIMB. Model II contains DISTANCE and CLIMBRATE
Estimates in LD{u:)
or LD^^'^iv) Model I Model II
d^ Ai = 30.47 Ai = 28.52
Ai=21.15 Ai=16.41
A2=16.49 A2=16.25
influence when fitting the regression model for winning time as a linear combination
of DISTANCE and CLIMB. However, as was evident in Figure 1.5(c), the two
regressors are linearly related. An alternative model can be fit by replacing climb in
feet with climb in feet per distance travelled, CLIMBRATE . This regressor does
not have the strong positive association with distance that the original coding of
CLIMB has. This should reduce the coUinearity in the model, thereby reducing
sensitivity to perturbations in the X matrix.
Using standardized regressors, the new model is fit. Column 2 of Table 3.3
shows the curvature values for this new model. Cmax for directed influence on ^
has been substantially reduced, indicating that the MLEs for the second model are
indeed less sensitive to perturbations in the X matrix.
Figures 3.9(a), (b) and (c) provide the profile plots of the absolute value
of the directional vectors that accompany the curvatures in Table 3.3, Column II.
All three plots tend to indicate DISTANCE values as being more influential than
CLIMBRATE values, and all three plots show substantial change from those in
Figure 3.8. The plot for a"^ now indicates the DISTANCE value for observations 7
and 18 as influential, rather than just 18. No elements stand out in Vmax, while case
11 's DISTANCE value is a very substantial contributer to t;2
1
0.75
0.5
0.25
■0.25
0.5
•0.75
1
* * *■' *
r* — • — ' — ' — ' — •— * — ' — ' ■•' — ' • ■ ; •'•■•• '^ » ' — ' — t — ■ * ' ■ ; ' *« * , ■ » ' • ' —
10 . . • 20 • .'30 . •40 5^ • 69 *. TO
■..1,*
Figure 3.7: Hills data X + W pert.; Vmax for a^
81
Example 2
We now consider applying the X + W perturbation to the school data and
examining directed influence on 0. The purpose of this example is to demonstrate
alternative methods for displaying Umax when the number of regressors and/or
observations is large.
Figure 3.10 is the plot of Umax for directed influence on 0. Elements 120
correspond to SALARY, elements 2140 correspond to WCOLR, elements 4160
correspond to SES, elements 6180 correspond to TSCORE, and elements 81100
correspond to MOMED. The two regressors that have little association with other
explanatory variables, SALARY and TSCORE, have small elements. In contrast,
some of the elements for the other regressors are more substantial. This plot does a
fair job of highlighting regressors, but a poor job of highlighting observations.
Figure 3.11 is the profile plot of umax It is easy to see that cases 3 and 18
have multiple regressor values that are influential. In addition, one of the values for
case 2 is influential. However, is to difficult to keep track of which line represents
which column. Another option is to plot "profiles" of each row, as in Figure 3.12.
Here, some of the elements of observations 2, 3, and 18 are highlighted.
82
(a) l^maxl for a^
1
0.75
0.5
0.25
0.25
•0.5
■0.75
1
5 10 15 20 25 30 35
(b) It^maxl for /9
(c) \V2\ for ^
Figure 3.8: Hills data; X ^W pert.; column profiles of directional vectors. DIS
TANCE values are connected by solid line. CLIMB values are connected by a dashed
line.
83
(a) It^maxl for u
1
0.75
0.5
0.25
■0.25
0.5
0.75(
•1
5 10 15 20 25 30 35
(b) l^maxl for ;9
(c) \v2\i0xh
Figure 3.9: Hills data; Model II; X + IF pert.; column profiles for directional vectors.
DISTANCE values are connected by solid line. CLIMBRATE values are connected
by a dashed line.
84
0.4
0.2
. •• • 20 '
40. •.60'^ '80 . • 100
0.2
■0.4
Figure 3.10: School data; X + W pert.; Umax for directed influence on $
Figure 3.11: School data; X+W pert.; Column profiles of umax for directed influence
on 0. Thick solid line SALARY, thin solid line WCOLR, dotdashed line SES,
smalldashed line TSCORE, largedashed line MOMED.
85
0.4
18 a
18 .
0.3
rK
\ ^A
/
/ y
0.2
//
0.1
\////^
^^^^
^^m
2
3
Figure 3.12: School data; X + W pert.; row profiles of \vj_
on ^ with large elements labeled.
for directed influence
A bubble plot of  WmaxI provides the best graphical display of the information
in I'max (Figure 3.13). Each element is represented by a point plotted according to
both the corresponding regressor (xaxis) and case (yaxis). Around each point is a
bubble with radius equal to the absolute value of the element, and so large bubbles
indicate large contributors to the direction of maximum curvature. Again, we see
that the elements corresponding to Xi8,2, a;i8,5, 0:3,2, 2:3,5, and 0:2,3 have the largest
influence, as evidenced by having the largest bubbles.
86
Row ,, .J . .
20
. '^ . ■ ®
•
i^v^*
* '.
■*■ ^•
*.u
'i
. o« ■
> o
®
e
®
•
15
• ■ •
.
• ^
• «
10
• • 
• • •
• •
©
5
®
©
®
• o •
o
• ® O '
) ®
©
•
12 3^
1 5
Column
Figure 3.13: School data; X + W pert.; bubble plot of l^maxl for ^
CHAPTER 4
LOCAL ASSESSMENT FOR OPTIMAL ESTIMATING FUNCTIONS
It is possible to examine the local influence of perturbations on measures other
than the likelihood displacement, LD{u}), and the profile likelihood displacement,
I,Z)[^i](a;). Several authors have applied local influence techniques to specific
measures of influence. For instance, Lawrance (1988) and Wu and Luo (1993c)
looked at influence on the BoxCox transformation parameter. Also, influence
has been examined on the residual sums of squares in regression (Wu and Luo
1993b), deviance residuals and confidence regions in GLMs (Thomas and Cook 1990;
Thomas 1990), and goodnessoflink test statistics in GLMs (Lee and Zhao 1997).
However, considerably less attention has been given to assessing local influence
when estimation is not performed by ML. Some work has been done on extentions
to Bayesian analysis (McCuUoch 1989; Lavine 1992; Pan et al. 1996), and Lp norm
estimation (Schall and Gonin 1991).
In this chapter, local influence techniques are extended in two ways. First,
we consider the influence of perturbations on a general class of influence measures.
Second, we address parameter estimates that are found via estimating equations.
The intent is to unify nearly all previous work on this methodology and provide a
conceptual basis and computational tools for future applications. The first three
sections outline the inferential platform, a general class of influence measures,
and assumptions about the perturbations. Next, techniques for local assessment
of perturbations are discussed. In Section 4.4, we give some algebraic results
for the two main diagnostic quantities: the gradient vector and the acceleration
matrix. Section 4.5 contains computational formulas, while Sections 4.64.8 contain
applications. Several measures, including LD{(jj) and LD^^^^u)), are considered in
87
: 88
section 4.6 using ML. Section 4.7 addresses influence on the generalized variance
with an emphasis on ML. Finally, the last section considers quasilikelihood.
4.1 Inferential Platform
This section outlines assumptions about estimation, the nature of
perturbations, and how the effects of perturbation are measured.
4.1.1 Estimating Functions
Suppose that a statistical analysis is performed to estimate p parameters,
0. Further, suppose that point estimates, 0, are obtained as solutions to a set of p
unbiased estimating equations,
^^' ■ ■ \ * h{ey)=0, (4.1)
where for each estimating function^ hi{0;y),
■ \ '■ * Eg{hi{9;y)) = i = l,...p.
Sometimes these estimating equations arise from minimizing or maximizing some
criterion with respect to B. Examples include leastsquares normal equations, score
equations and equations resulting from minimizing a Bayes risk. Quasilikelihood
(QL) equations (Wedderburn 1974) were originally motivated by maximizing a
quasilikelihood function, but the existence of the objective function is not necessary.
Generalized estimating equations (GEEs) (Liang and Zeger 1986), like QL equations,
merely require assumptions about first and second moments.
The theory of estimating equations (EEs) dates back to early work by V. P.
Godambe (Godambe 1960; Godambe 1976). There has been renewed interest in the
subject due to new estimation techniques for semiparametric models and models for
clustered nonnormal responses (Zeger and Liang 1986; Prentice 1988; McCullagh
and Nelder 1989; Desmond 1997a). A considerable amount of work has centered
89
on obtaining optimal estimating functions (OEFs) (Godambe 1976; Crowder 1986;
Crowder 1987; Godambe and Heyde 1987; Firth 1987; Godambe and Thompson
1989; Godambe 1991). To this end, the efficiency matrix of a set of p estimating
functions is defined as
Eff(/i) = (E
dh/_
dG
)'W[h]{E
dh
89'
)\ (4.2)
Functions that minimize this nonnegative definate matrix are considered optimal.
Roughly speaking, this index of efficiency is small when the variance of the
estimating functions is small and the average gradient is large. Provided that the
estimating equations arise from maximizing or minimizing some function, we may
interpret that large gradients indicate the function is not flat around its extremum.
Assuming that a fully parameteric form for the data is assumed, the score
functions are the unique OEFs up to a scalar multiple (Godambe 1960; Bhapkar
1972). However, with fewer assumptions about the data, it is generally not possible
to find a unique set of optimal estimating functions. Instead, optimality is bestowed
upon estimating functions within a certain class. One such class of estimating
functions are those that are linear functions of the data.
Under mild regularity conditions, the asymptotic variance of is in fact given
by (4.2) (Morton 1981; McCullagh 1983; Crowder 1986). Thus, minimizing Eff(/i)
is equivalent to minimizing the asymptotic variance of the parameter estimates.
Note that the first interpretation of the efficiency matrix is appealing in that it is a
finite sample index, but somewhat awkward in that it is based on a property of the
estimating functions. The second interpretation is appealing in that it is based on a
property of the estimates, but limited in that it requires asymptotics.
For the purposes of applying local influence diagnostics we assume only that
the estimating functions have been standardized. That is, given a set of unbiased
estimating functions, h*{d\ y), we utilize the standardized estimating functions given
90
by
hi0;y) = E
dh*'
{W[h*])'h\ (4.3)
This standardization has three implications. First, \{h{0;y))~^ is equivalent
to (4.2) and hence equivalent to the asymptotic variance of the parameter estimates.
In addition, the standardized estimating functions are optimal in the following sense.
Given a set of unbiased estimating functions, {/i(0;y)}, the OEFs amongst the
class of linear combinations of the h\ are given by (4.3). Hence, the standardized
estimating functions are an optimal rescaling of the original estimating functions.
Note that the negative sign is present in (4.3) so that the score equations, rather
than the negative of the score equation, are considered standardized. Finally,
standardized estimating functions are "information unbiased" (Lindsay 1982),
meaning that the following equality holds:
dh
E
= E[hh'] = V[/i]. (4.4)
d0\
Hence, V[^] = —E[dh/d0']. This fact is used later in the chapter to derive influence
diagnostics for confidence ellipsoids.
4.1.2 Perturbations
Perturbations are again introduced as a vector of q real numbers, a;. These
perturbations may affect the data, the model and its assumptions, or perhaps even
just the estimation technique itself. Examples of data perturbations are minor
changes or rounding errors in the response and the explanatory variables. An
example of model perturbation is the variance perturbation scheme used in Chapters
1 and 2. Finally, perturbing the contribution of each observation to the estimating
functions by caseweights is an example of perturbing the estimation technique.
This perturbation does not affect the distributional assumptions of the data, but
rather how each observation is utilized for inference. Another perturbation to the
91
inference method would be to allow the value of the hyperparameters in a Bayesian
analysis to vary. Though some might argue that the hyperparameters are part of
the model, their values are often chosen for computational convenience.
For the purposes of this chapter, we assume that the perturbation scheme
results in a set of p perturbed estimating equations,
' K{0;y)=O, (4.5)
•V.  " \" V ."
which have solutions 0^. It is also assumed that there exists a null perturbation, Wq,
whose estimating equations are identical to the original ones. Finally, it is assumed
that there are no constraints on cj that would limit their values in a neighborhood of
Wo For example, constraining the space of variance perturbations to positive values
is allowed. However, constraining the perturbations to sum to one is not allowed.
These kinds of constraints could be accommodated, but are not considered here.
4.1.3 Influence Measures
The final item to specify is a quantity whose sensitivity to perturbation is of
interest. This influence measure serves as a barometer for how perturbation effects
inference. The measure can depend on w directly and indirectly via some r x 1
vector of functions of the perturbed parameter estimates, 7(^0)) Some influence
measures, m{u},'y{0^)), may be appropriate regardless of the method of estimation
and the perturbation, while others may be more applicationspecific.
Examples of rather general objective measures are the following:
.., . mi =x'0^ (4.6)
m, = l^ (4.7)
92
where E^{yi) = fii^. None of these three measures depend on w directly. The first is
a linear combination of the perturbed parameters. Here, ^{0^) = x'O^ is a scalar.
The second measure is the ratio of the i*'' unperturbed parameter estimate to the z*''
perturbed estimate. Again, 7(^0;) = ^i„ is a scalar. The third measure resembles a
Pearson goodnessoffit statistic based on the perturbed estimates. Since it is not yet
specified how the means, /ij^, depend on the estimates, we simply write 7(^0;) — 0^.
The following measures are more applicationspecific, in that they depend on
how inference is performed and perhaps even upon the perturbation itself:
m, = LD{uj) = 2\L{e y)  L{9^ y)] " (4.9)
m5 = LD[^^Hu;) = 2[L(0i,^2;?/)m.;,^(^iJ;y)] (4.10)
m6 = LD*(a;) = 2[L,(0,;y)L,(^;y)] (4.11)
ms = log
log(^^^)
, f{y\e^y_
(4.12)
(4.13)
The first of these measures is the likelihood displacement, for which 7(^0,) = Ouj
The second is the profile likelihood displacement, which has 7(^0^)' = (^iw)5(^iw)'))
where g{6iu) maximizes the profile likelihood of Bi^^ with respect to 62. The third
measure is a likelihood displacement with a "moving frame" because the base
likelihood of the measure is the perturbed one. Obviously, these three measures are
intended for use when estimation is performed by maximum likelihood.
The influence measure rrij is a KullbackLeibler divergence between the
densities f{y \ 9) and f{y \ OJ). The quantity is nonnegative, achieving its
minimum value of zero when the two densities are equivalent. Here, 7(^0;) = ^w
The last measure addresses sensitivity of asymptotic standard errors. Recall
that the determinant of a matrix equals the product of its eigenvalues, and when
the matrix is a variancecovariance matrix for a vector of estimates 0^ the squared
93
eigenvalues are proportional to the lengths of the principal axes of a confidence
ellipsoid for 6. Hence, the determinant of the variance provides a single measure
of the confidence ellipsoid's size. Provided the estimating functions have been
standardized, mg is the log of the estimated generalized asymptotic variance of 0. It
is a function of a; both directly and indirectly via 7(^w) = Out
As a final note, it is not necessary for the influence measure to be a function
of point estimators. In Bayesian analyses, perturbation can be measured by
the KullbackLeibler distance between the perturbed and unperturbed posterior
densities. Such a measure would depend on u;, but not on any point estimates of
0. McCulloch (1989) developed a general local influence technique for assessing
perturbation of the prior distribution in a Bayesian analysis. Other types of
perturbation analysis for Bayesian models are found in Kass et al. (1989), Geisser
(1991), Geisser (1993), and Weiss (1996).
4.2 Local Assessment of Perturbations
At this point we have set the stage for our sensitivity analysis: a perturbation
is introduced into the inference, and a measure of the effects is chosen. The next
step is to obtain some useful diagnostics. This section decribes how a Taylor series
expansion of m and some differential geometry results can be used to describe the
function.
4.2.1 Firstorder Local Assessment
As in Chapter 1, let ijiJ.u{a) = ujq + av he a vector in the space of
perturbations. The unitlength vector v determines the direction from Wq and the
value a determines the distance from a>o in fi. The value of m for a perturbation of
length a in direction v can be approximated by a firstorder Taylorseries expansion
94
around the null perturbation:
m(a) « m(0) + ^(a  0)
= m{0)+apv (4.14)
= m(0) + arh'v,
where the partial derivatives are evaluated at a = 0. The quantity rh'v has a
geometric interpretation. Consider the influence graph of m (i.e., the influence
function described as a surface in 3f?'"^^). For each direction v, there is an associated
lifted line on the influence graph. The slope of this "slice" of the graph is given by
S = rh'v, (4.15)
and it is sometimes called the directional derivative. The direction with the largest
absolute slope is proportional to m, the gradient vector.
Both the directional derivative and the gradient vector have diagnostic value.
To explain, recall from Chapter 2 that we may use a perturbation's Euclidean
distance from the null perturbation, ti; — a;o, as a measure of its size. Further, let
us define the effect of the perturbation as m(o) — m(0). If m(0) = 0, the effect of
a perturbation is the absolute value of m{a). The directional derivative can then
be interpreted as the approximate eff"ect of a size1 perturbation in direction v.
Furthermore, suppose that we have perturbations in two different directions: u;^
in direction v^ and Wg in direction ug. If 1 5^  is A; times larger than j^sl, then
perturbation Wg must be k times larger than a>^ to have the same eflFect. Therefore,
directional derivatives are measures of the relative influence of perturbations.
Additionally, the gradient vector shows how perturbations can be scaled to achieve
maximum local influence. That is, Wq ± (m/m) has the largest approximate
effect, mp, among perturbations of size 1.
95
If m(0) is a local minimum, maximum or saddlepoint of the influence graph,
then the gradient is zero. Here, a firstorder local assessment does not provide any
diagnostic information. Instead, we can turn to a secondorder approximation of the
influence graph.
4.2.2 Secondorder Local Assessment
A secondorder Taylor series expansion of m is given by
m(a) « m(0) + — (a  0) + ^^(a  0)2
,^, dm a^ , dm^ /^ i«\
a^ 
= m(0) I arh'v + —v'Mv, ,.
where the derivatives are again evaluated at a = 0. The quantity A = v'Mv also
has a geometric interpretation: it is the rate of change in the slope of the lifted line
associated with direction v, i.e. it is the acceleration of the curve. The matrix M
is the acceleration matrix, and the eigenvector associated with its largest absolute
eigenvector is the direction of maximum absolute acceleration. If the gradient
vanishes, the secondorder approximation simplifies to:
m{a) = m(0) + —v'Mv. (4.17)
Li
Both the acceleration and the acceleration matrix have diagnostic value,
particulary when the gradient vanishes. In this case, A can be interpreted as the
approximate eff'ect of a size\/2 perturbation in direction v. In addition, suppose
that we have perturbations in two different directions: u;^ and wg. If \Aji\ is k
times larger than \As\, then perturbation U3s must be \/k times larger than w^ to
have the same effect. Finally, the eigensystem of the acceleration matrix can be
used to identify influential perturbations. The largest absolute eigenvector of M
is the maximum approximate eff'ect of a sizev/2 perturbation, and the associated
96
eigenvector is the direction of maximum approximate effect. Other eigensolutions
can also be used.
4.3 Acceleration, Curvature and Normal Curvature
Two more geometric descriptions of the influence measure can be calculated:
curvatures and normal curvatures. These quantities are functions of both the
gradient vector and the acceleration matrix. When the gradient vanishes, the
curvature, normal curvature, and the absolute value of the acceleration associated
with a directional vector are all equivalent. From (1.5) the curvature associated with
direction v at the null perturbation is
i d^m.{a) I
c
da'^
Using (4.16), it follows that
\v'Mv\
(4.18)
a=0
c =
(1 + v'rnrh'vYI'^ ■■■ ' , .
(4.19)
" (i;'(/g + mm»3/2
If the slope is zero, then C = v4. As mentioned previously, curvature is a measure
of how quickly a curve turns.
The notion behind normal curvature is a bit more complicated. The vector
(u;„(a)', 0)' in W~^^ is projected onto the tangent plane of the influence graph rather
than the graph itself. A plane is then created that contains both the projection
as well as the vector that is orthogonal to the plane, the principal normal vector.
This plane then intersects the surface to create a normal section. Using the tangent
plane as the basis, curvature of the normal section can be computed as if it were
a simple plane curve. The resulting normal curvature is a description of how the
influence graph deviates from its tangent plane. The normal curvature associated
97
with direction v at the null perturbation is (Cook 1986)
(l + m2)V2t;'(J, + mm')r' , ^ " ^
If the gradient is zero, then NC = C = \A\.
The two geometric descriptions NC and C can be distinquished as follows.
Curvature is the reciprocal of the radius of the best fitting circle to the lifted line
associated with v, while normal curvature is the reciprocal of the radius of best
fitting circle to the normal section associated with v. If the gradient is zero, lifted
lines are normal sections, implying that NC = C = \A\. Additional insights are
found in Cook (1986), Wu and Luo (1993a), Wu and Luo (1993b), and Fung and
Kwan (1997). , , ^
4.3.1 How to Use Local Assessment of Perturbations
Local assessment of perturbations can be used to (1) assess the influence
of aspects of the statistical problem on inferences, and (2) compare the relative
sensitivity of parameter estimates.
First, we define the local influence associated with a lengthone vector v to
be:
{m'v if m 7^
(4.21)
\v'Mv\ if m = 0.
Thus, li{v) is simply the directional derivative unless the gradient vanishes, in which
case it is the normal curvature. The direction of maximum local influence, Vmax, is
the main diagnostic tool, and it can be displayed graphically using an index plot or
star plot to informally assess the influence of perturbations. Additionally, the local
influences of basic perturbations can be plotted against case indices to compare the
effects of one atatime perturbations. ; :.
98
The relative sensitivity of p parameter estimates can be assessed by plotting
the p maximum local influences for rrii = Oi^, . . . ,mp = 9p^. Measures with the
largest maximum local influences are most sensitive to the perturbation. Two
cautions with this technique should be noted. First, the metric of the parameter
estimates ought to be comparable. If the metrics are not comparable, standardized
parameter estimates should be used. The second caution pertains to first and
secondorder influence. Typically, the set of p measures are assessed by a firstorder
approximation. However, if a subset of the measures requires secondorder
assessment, then the following revised definition can be used to make the local
influences comparable:
■ f,'^ ■'■'*• ■■'■ ■ I \jh'v\ if m 7^0
li*iv) = < (4.22)
.,; I \^v'Mv\ if m = 0.
This ensures that all influence measures are compared by the approximate effects of
lengthone perturbations. . ,
4.4 Algebraic Expressions for Local Assessment
In this section, algebraic expressions for local influence diagnostics are
derived.
4.4.1 Firstorder Assessment
Theorem 4.1
The directional derivative in direction v of influence measure m{uj, ^{9^)) at the null
perturbation is given by:
S = {sl + s'^KJ)v, (4.23)
99
where
dm{oj,'y{e))
K
d(jj
«7 =
dm{uj,f{9J))
di{e^)
89'
89^
duj'
and o denotes evaluation at the unperturbed fitted model: u> = cjq and 0^ = 0.
Proof of Theorem 4.1 Taking partial derivatives of m we find
dm dm duj
da duj' da
'dm{u:,'r{0))
dijj'
dm{u},'y{0))
duj'
^ dm{i^,j{0^))df{0^) ^^
dm{LJ,j{0^))d'y{0^) d0^ ,
e=L dr{0J d0^ 9u}'
Evaluating at the original model completes the proof. D
Corollary 4.1
The gradient is given by
m = Sj^ + J'K's~f.
(4.24)
4.4.2 Secondorder Assessment
In order to make a general case expression for the acceleration, a large
number of partial derivatives are needed. The following lemma provides an identity
used to derive the algebraic expression for M.
Lemma 4.1
Define the following:
Ti
d(jjd(jj'
Gi
dHiO)
d0d0'
„ _ d'L
dwdijj'
Then,
T=[lr^ J'] GJ+[K® Ig] P,
100
for
rqxq
r2
G
rpxp
G2
r ^
:*
Pi
Ppqxq =
P2
.^p.
and where J and K were defined in Theorem 4.1.
Proof of Lemma 4.1 Consider:
dijjdu}' dui
_d_
89,
do', duj'
du:
7 + E
dYiiO^)
j=i dOj^
du:
+
de...
Evaluating this equality at the null perturbation yields
Ti — J GiJ
<
Stacking the Fj to form F completes the proof:
J'GiJ
j'GrJ
+
[Ir ® J'] GJ +
di2{L}T dy2{L) T
:rr Jn ttx ■'■n
ae
962^
= [Ir ® J'] GJ + [K® Jg] p. n
dwduj'
dwdu'
J
(4.25)
101
Theorem 4.2
The acceleration ofinSuence measure m{u}, 7(^w)) in direction v at the null pertur
bation is given by
A
dM^,7{0^))
= V
da?
,d^m{u,MO.))
duiduj'
V
h : =
v'Mv
(4.26)
= V
Ig J'K'
*9
KJ
*
+ [s; ® J'] GJ + [s'^K <^Ig]P\ V,
where
dwdu' l0 dudyiey I"
d"i{e)duj' I0 d'y(6)d'y(0)< I0
(4.27)
Aiternativeiy, we may write
^ = 1;'
= t;
J, j'a:'
I, J'K'
*9
KJ
+ Y^8^Ti V
i=l
\
(4.28)
+ j'[5]s^,G,]j + ^(s;ii:,)p,
j=i
i=l
V,
/
where s^. is the i*^ element of s^ and Ki is the i*^ column of K.
Proof of Theorem 4.2 First we define the following {q + r) x 1 vector:
/
S =
\
lie.)
102
Then,
9^m(a;,7(gJ)
■: da'
d_
da
d_
da
d_
da
dm{8) d(jj
duj' da
dm{6)
duj'
dm{5)
du'
V
v +
dm{6) dv
du}' da
=^ V
= V
= V
= V
= V
da dojdu;'
,d^m{S)
v +
dudu}
d_
d(jj
_a_
du3
V.
dm[5) dS
dS' du:'
V.
dmjS)
dS'
dS
g+r
fi,.,' "^ 2^
du3
dS' d^mjS) dS
duj dSdd' duy
'dS'd'^miS) dS
i=l
g+r
i=l
r
dm{S) d^Sj
d6i dijjdijj'
dm{6) d^Sj
dSi duidu'
V.
V.
, V^ dmjS) d'jije^)
(4.29)
aw dSdS' du;' ^ 57i(^^) dupdu'
where the last simplification occurs because d'^cui/divdu}' = 0. Examining the first
term of d'^Tn{S)/du}du}', and evaluating at the null perturbation we find
dS' d^m{6) d6
du dddS' dui'
dui dui
I, J'K'
d^mjS)
dSdS'
Jo
dw
dui'
du'
J
dujdw' l0 dwdy(eY l^
di(9)du)' lo d'y(e)d'r{eY 1°
(4.30)
103
Examining the second term of d^m{6)/du:du}' and evaluating at the null
perturbation, we find
dmjS) g^7i(^J
E
= [s; O /,] ([/, ® J'] GJ + [K^ Ig] P)
= [s; J'] GJ + [s'^K ® /,] P
(4.31)
by Lemma 4.1. D
4.4.3 Comments
' While the algebraic expression for 5 is quite simple, the expression for A
is somewhat cumbersome. However, a secondorder assessment would typically be
used only when the gradient vector vanishes. In most situations then, sy = 0, and
the acceleration matrix simplifies to
M =
I, J'K'
■*9
KJ
(4.32)
The algebraic expressions are helpful in that they decompose the diagnostics
into building blocks, most of which are immediately computable. Specifically, the
quantities 8^, s^, S, G, and K should be immediate upon choosing an infiuence
measure and fitting the model. In the next section, it is shown that the matrices J
and P are also easily obtained in most situations.
4.5 Computational Formulas for J and P
In order for the formula for the gradient and acceleration matrix to be of
practical use, they must be easily computed in a wide range of applications. In this
section, it is shown that the matrices J and P can be computed even when the
unperturbed parameter estimates are not closedform. The only requirement is that
the perturbed estimating functions be closed form expressions of a; .
104
4.5.0.1 Computation of J
The following theorem generalizes equality (14) in Cook (1986).
Theorem 4.3
Let Oij be the solutions to the p perturbed estimating equations hi^{0; y) = 0. Then,
J=h~^A,  (4.33)
where
dK{e;y)
A =
duo'
Proof of Theorem 4.3 By assumption, we have the following equality
Taking derivatives with respect to cj' on both sides we find
(4.34)
dK{ey)
die'
where is now a. p x q matrix. Evaluating both sides at the original model we find
A + hJ = 0,
and the theorem follows directly. D
Theorem 4.3 has two implications on computation: the gradient vector
is easy to calculate, and the acceleration matrix is easy to calculate provided
s^ = 0. Neither h nor A require that the solutions to the estimating equations
be closedform for either the perturbed or the unperturbed inferences. In addition,
the matrix h is often computed as part of the estimation process. Not only is
it a byproduct of using the NewtonRaphson rootfinding technique to solve the
estimating equations, but it serves as an estimate of the asymptotic variance of 0,
the inverse of (4.4). When estimation is performed by ML, h~^ is the inverse of
105
the observed information matrix. The matrix A is also easily computed provided
that the expressions for the perturbed estimating functions are known. This
computation was demonstrated many times in Chapters 13 for maximum likelihood.
Computation of A could be difficult when the estimating equations themselves are
not closedform. An example would be ML inference for generalized linear mixed
models (McCulloch 1997).
4.5.1 Computation of P
In this section, a formula for P is given so that it, like J, it can be computed
without reestimating the model. To begin, we present the following lemma, which
provides an identity used to construct P.
Lemma 4.2
Let 0^ be the solutions to the p perturbed estimating equations hcj{9; y) = 0.
where
h'i^Ig P =
I, J'
*^
I,
J
l,...p,
hi = the i*^ column of h
*i
Qi Ri
(4.36)
Qi
d'hUe;y)
9u;9a;'
Ri =
d^hiU0;y)
dujdO'
hi =
d'hiU0;y) ■''^'
dOdO'
.■•' ■'• lif
J = h A.
106
Proof of Lemma 4.2 Taking the z*'* row of (4.35) in the proof of Theorem 4.3
gives the following equality:
dhi^{e\y)
d(jj'
dhi^{e^;y)d0^
0'.
e^e^ del du^
where 0' is a 1 x g matrix. Taking derivatives of the first term on the left side of
this equality with respect to a;, we find
(4.37)
du:
dhi^{e;y)
du}'
0=0^
d^hU0;y)
dijjduj'
d0^d'hU0;y)
0=0 ^^ d0dij}'
0=0^
(4.38)
Taking partial derivatives of the second term with respect to u;, we find
_9_
d(jj
dhi^{0^;y)d0u
80',, 9u:'
d'hi^{0y)
du:d0'
+
d0^d'hiUO;y)
du} 0080'
9=0^
8uj'
8hi^{0^;y) 89 j
(4.39)
8u}8u}'
Using (4.38) and (4.39), the next step is to take derivatives of the equality in (4.37)
with respect to oj and then evaluate the result at the null perturbation. It follows
that
d_
8(jj
8hi^{0;y)
8U3'
dhi^{0^;y)80a
0=0^
80,
8u}'
d'hUO;y)
dujduj'
+
d0^d^hUe\y)
Q^Q^ 8u} 808(jj'
+
8^hi,{0;y)
8lj80'
0=0 Jo
80^ , 80,8%^{0y)
Q^Q 8u}' 8uj 8080'
0=0.
00^
8U3'
E
j=\
dhi^{0^;y) 8di,
89,
to, dujduj',^
(4.40)
= Qi + J'R'i + RiJ + J'hiJ + Y^
dhiu{0;y)
I J'
Qi Ri
Hi hi
+
where is now a, p x p matrix. The lemma follows immediately. D
I
J
89,
h: ® I,
107
The theorem below now provides a formula for P that does not require
reestimation of the model.
Theorem 4.4 *^ ^ s :
Let 0^ he the solutions to the p perturbed estimating equations hu{0; y) = 0. Then,
P =
1
h ®[la J']
*
/
J
(4.41)
where
* =
*i
*.
(4.42)
Proof of Theorem 4.4 Taking the p sets of equalities of the form given in
Lemma 4.2, we find .,
hi (8) Jq
hp ® Ig
Manipulation of the left side of the equality proceeds as:
Ig J'
*1
I,
J
•
/, J'
J
(4.43)
^'l ® Iq
h\\Iq hi2lq
•
p =
h^llq h22lq
h'p (2) Ig
=
h®Ig
P.
108
Manipulation of the right side of the equality proceeds as:
I, J' \
*1
I,
J
r
lo
J'
*1
r 1
\
L
J
h
r ]
r

L J
J
I, J'
*p
J

h
J'
[I, J']
[la J']
Ip^ila J']
*
It then follows from (4.43) that
h^Ir,
h ®In
"W
Ip^lla J']
*
*
1
^7
Ip^ I, J'
*
J
J
J']
*
I,
. n
1
J
Corollary 4.2
Pi = Y^ih'^)
p
Ig J'
*.
= Yih''){Qj + RjJ + J'R'j + J'hjJ)
109
4.5. 1.1 Remarks
The formula for P can be combined with the results of Theorem 4.2 to
further revise the acceleration matrix:
M =
dujduj'
I J'K'
I J'K'
I
KJ
I
KJ
+ [s; ® J'] GJ + [s'^K ® /,] P
+ [s; ® J'] GJ
s'^K ® Jj
h~ ® [ J^ J']
*
I
J
I J'K'
I
KJ
+ [s; ® J'] GJ
+
s'^Kh ®[T J'
*
I
J
Also, we may write
p
P
1=1
p p
/, J'
*.
^.
J
= EEK^^)(^'')
j=l i=l
/, J'
*.
*9
J
= Y^{s'^K{h;))
j=i
/, J'
*,
(4.44)
(4.45)
Hence, another form of the acceleration is given by
M
I J'K'
I
KJ
^J'[Y.s„G,]J
i=\
i^^Kh, ')
I. J'
*i
I,
i=i L J
J
Note that in all computations, J = —h A as per Theorem 4.3.
110
(4.46)
4.6 Local Assessment for Maximum Likelihood Inference
In this section some brief examples when estimation is performed by ML are
given. The influence measures addressed include linear combinations of parameter
estimates, ratios of parameter estimates, the likelihood displacement, the profile
likelihood displacement, and the KullbackLeibler divergence. Emphasis is placed on
constructing the building blocks of local influence. From Theorem 4.3, J = —L A.
The individual measures determine 7(^0;) and the remaining building blocks:
dmiiuniO))
K
dm{u},'y{e^))
36'
S =
dj{e^)
4.6.1 Linear Combinations
Consider a linear combination of the parameter estimates under perturbation:
^ti>.:
m = xOui.
The building blocks of firstorder local assessment are:
7(00,)= x'K
s^=
K= x'
8r^ = 1.
Ill
Construction of the gradient and directional derivative is immediate:
S^rh'v
— x'L Av.
Note that the direction of maximum absolute slope is proportional to m oc A'L x.
This is exactly the same as the direction of largest curvature for the profile likelihood
displacement of x'O^j under the orthogonal parameterization given in Theorem 2.1.
y* ,. J, ■>. ■ \
4.6.2 Ratio of Estimates i« • VV ^' "^^
For the measure m = Oi/Oi^^, the building blocks for a firstorder local
assessment are
where dj is a p x 1 vector with a 1 in the i"* position and zeroes elsewhere. Hence,
S = rh'v = (si + s'KJ)v = Id'iL'^Av = IV'Av,
" i +1, " — 1
where L is the r"' column of L .
4.6.3 Likelihood Displacement
Recall that the likelihood displacement is given by
m(a;,7(^J) = 2[L(^;i/)L(0<,;2/)].
This influence measure requires a secondorder local assessment, as is now shown.
First, it does not depend on w explictly, so 8^^ vanishes. Since the exact form of the
112
likelihood has not been specified, we simply write 7(^cj) = 9^. It follows that
am(a;,7(^J)
dl{9.)
dL{e^y)
= 2
56>,
= 2
dL{ey)
80
0.
Hence, the gradient vector is zero. Proceeding with a second order assessment, we
first note that K = Ip. It follows that
M =
I, J'K'
I J'K'
I A'L
I A'L
1
■^9
KJ
I
KJ
21/
+ [s; ® J'] G J + [s'^K ® Jj P
L A
L A
= A'L \2L)L ^A
^2A'L~^A.
This is equivalent to the result given in Section 2.1, as per Cook (1986).
4.6.4 Profile Likelihood Displacement
The profile likelihood displacement,
m(a;,7(0J) = 2[L(^i,^2;y)  /^(^w,^(^iu.);2/)],
also requires a secondorder assessment. First, the influence measure does not
depend on a; explictly, so s^j vanishes. Also,
Oy
am(a;,7(0.))
di{e^)
2
aL(7(^J);y)
di{e^)
2
dL{ey)
30
= 0.
This implies that the gradient vector is zero.
113
The next step is to construct the building blocks of a secondorder assessment.
In order to obtain a formula for K, we present the following result, which appeared
in Cook (1986) as equation (24):
. dg{9,)
de\
— Llryn Ll
22 ■'^21
where
X =
L21 L22
Using this result, we find
K
dim
dO,.
d9i
I
— 1/22 ^21
(4.47)
(4.48)
114
It follows that the acceleration matrix for the profile likelihood displacement
IS
M^'^' =
I J'K'
I J'K'
I J'K'
I J'K'
= 2J'K'LKJ
KJ
+ [s; ® J'] GJ + [s'^K Ig] P
" d')(e)di{9)' 10
'' flOflOl
/
KJ
d9d9' 10
/
KJ
2L
I
KJ
2A'L
= 2A'L
■t'll ■£'12
L21 L22
/.,
22 ^21
1/99 Loi
Z^A
t'li — L12L22 L21
L A
2A'{L ' 
L22'
)A.
This is equivalent to the formula given in Section 2.1, as per Cook (1986).
4.6.5 KullbackLeibler Divergence
The KullbackLeibler divergence,
m — Ea
log(i(?^)
/(W9J
115
also requires a secondorder assessment. The measure does not depend on u;
explictly, so s^ vanishes. Also, 7(^0;) = 0^, implying
dm{u;,f{e^))
di{e^)
= — ttEo
del
dE,
f{y\e.)'
oe'
Ee
dL{e^y)
de,
Ee [0]
0.
Proceeding with a secondorder assessment, we first note that K = Ip. We
also make the following observation:
•va(.
— ; TrEa
f{y\e^y
— : ttEs
L{e^\y)
■ . . . ■ =^
Evaluating this at the null perturbation implies
' d^L[0y)
ddde'
(4.49)
''77 ~ '^e
Jo
and the acceleration matrix follows:
I J'K'
I A'L
= AT' E
/
KJ
s
77
L A
(4.50)
d'L{0;y)
8080'
l'a.
116
An interesting consequence of this result is the following: if the observed and
expected information matrices are equal, then the acceleration matrix is proportional
to the acceleration matrix of the likelihood displacement.
4.7 Firstorder Assessment of the Log Generalized Variance
In this section, some general results for firstorder local assessment of the log
of the generalized variance are presented. In addition, the directional derivative for
the variance perturbation of a regression model under ML is derived.
4.7.1 General Form of the Directional Derivative
Theorem 4.5
For the measure
m = log
the A;*'' element of the vector s^ is given by
'd'K{0;y)
—trace
h
duJkdO'
and the f^ element of the vector 8^ is given by
k = l,...q,
— trace
Additionally, we may write
dejdo'
h
j = l,...p.
s^ = R{h ) =
ill ^2 • • • Rp
vec
(^ ),
where each Rj is a q x p matrix of the form
Rj —
3 = l,P
(4.51)
(4.52)
(4.53)
(4.54)
117
Similarly,
8y = h{—h )
hi /i2 hp
vec{—h ),
(4.55)
where each hi is a p x p matrix of the form
d9d0'
Note that the matrices Ri and hi appeared earlier in the computational
formula for P (Theorem 4.4).
Proof of Theorem 4.5
To begin, we present the following two matrix differentiation results for a matrix A
and a scalar t (Searle et al. 1992):
dAT^ _ _^.dA
dt ~ ^ dt^
d_
dt
log A = trace
.idA
'dt
(4.56)
(4.57)
Combining these two identities yields
d
dt
log \ — A ^ I = trace
— trace
= —trace
(A')'l(A')
dA
df
(4.58)
Using (4.58) we find
. d
dui
m{u}, 0u) = —trace
d^K{ey)
dcjkde'
dK{0^;y) ^^
6=9^
dO'
y
and evaluating at a; = wq and 6 = yields (4.52). Expression (4.53) follows from a
similar argument.
118
The next step is construction of (4.54). First, the trace of product of two matrices
Arxc and Bcxr is (Searle 1982)
Letting A*' denote the ij*'' element oi h , we may write (4.52) as
p p
dhU0;y)
p p
h'K
Construction of s^ follows:
«a, = 
P P
1=1 j=i
3hi,^{e;y)
aK,i(6\y)
dhi,^{e;y)
dhx,^{e;y)
dOpUJi
a/i2,u,(e;v)
deiuji
dhiAOw)
aepui
dhpMOw)
d6p(jji
=
dhi,^(e;y)
dBiuj2
dhi,^{e;y)
d9pUi2
'" <^'
■
dOlUlq
dhi,^{e;y)
dOpUlg
dh2,u(e;y)
deiUq
dh2,u,(6;y)
dBpUq
dhp,^(0;y)
dOpUlq
vec{~h )
ill R2 • • • Rp
vec{—h ).
Construction of s^ proceeds similarly. D
Corollary 4.3
For the measure (i.bl), the slope in direction v is
1,
s = vec{h y
R' + hJ'
V,
and the direction of maximum absolute slope is
(4.59)
Umax <X m =
R + Jh
vec{h ) .
(4.60)
119
4.7.2 Variance Perturbation for ML Regression
In this section, firstorder local assessment diagnostics are derived for the
variance perturbation in ML regression with the perturbed generalized variance
as the objective measure. Because h has a simple form, we derive s^^ and s^
elementbyelement rather than construct the matrices in (4.54) and (4.55).
4.7.2.1 Construction of 8t^
To derive the i*^ element of s^^ we begin with the following results from
Section 2.5:
d^K{ey) _ 1
dl3dui
d'LUO;y) 1
;{yi  x'i0)xi
'/3\2
{yix[0)
(4.61)
(4.61a)
where Xj is the covariate vector for the i*'' observation. Taking partial derivatives of
(4.61) leads to
d'LUe;y)
d^d/3'duji
xnpi)Xi
r.XiX^
Taking partial derivatives of (4.61a) yields
d'L^{e;y) 1
' a\2
{Vi  <I3)
After evaluating these expressions at the null perturbation, we can construct the
matrix
dK{9y)
duide'
d'L^{0;y)
dedd'dui
 ~2XiXi
r; /
v4 "^i
s
(4.62)
120
Accordingly, the trace of the following matrix multiplication is the i^^ element of s^^:
dK{0;y)
dujide'
{h ) =
^y
»*^i
a^x'xy^
0' 2(TVn
XiX^{X X) "^"^i
^x[{X'X)
lV\l 2r^
(4.63)
In order to find a formula for the trace, an expression for the fc"' diagonal element
of the upper left portion of the matrix is needed. Letting Ujk be the jk*^ element of
{X'X)\ the A;*'' diagonal element of Xix'iiX'Xy^ is
pi
'•}■■
• . j=l ■ * .
Hence, the trace of (4.63) is given by
pi Pi 2r^ ""^ ^~^ 2r?
fc=i j=i j=i fc=i
From this, the vector s^j can finally be constructed:
■vp — 1
2rH
(Xy7=l ^nj Zjfc=l(%fe^nfe)j + „o.2
X21X2
an
1,
—
"21
_ a2(pi) _
%;K \
Xi(pi)Xi
Xn{pl)X^
a(pi)i
a(pi)(pi)
na
2 «9
(4.64)
Letting a, be the i^^ column of (X'X) ^, it follows that
121
r 
r 1
XuX[
^i2a:i
3:21X2
ai 
3^21 aj'a
a2..
. —
_ a;„ix; _
_ a;„2< _
X2{pl)X2
^n(pl)25„
flp1
ncr
2' «9
XiiX'i
3:21X2
a:i(pi)Xi
3:2{pl)X2
3:nlX„ . . . Xn(pi)Xj^
tti
ttpi
no
2' «9
£)(Xi)X ... D(Xp_i)X
vec(X'X)^  — r
ncr
2' «9
D(Xi) ... D(Xp_i)
[Vi®X]vec(X'X)^
no"
2' «9
4.7.2.2 Construction of Sy
Starting from the Hessian of the unperturbed loglikelihood,
dh{e;y) _dL{e;y)
d0'
dOdO'
ix'x
^(yX/3)
■My  x/sy ^Mv ^^Yiy  x/3)
(4.65)
the following partial derivatives at the null perturbation can be computed:
d'Lie^y)
d'L{0;y)
=
da^d^dlS'
d'L{0;y)
Xx'x
d'HB; y)
= .X'iy  X/3)
= ttXV =
d{a^y
^e + ~s(yxf^nyx^)
n Sr'r 2n
a^ (T* a^ '
122
Now the elements of Sj can be constructed:
dm{u), e^)
d/3i
= trace (
d'L{0y)
= trace (
= trace (
= 0.
d/3id0d0'
ih ))
1 v'
X'Xi
1 vi
'.x\x
j,X',X{X'X)'
<t2(X'X)i
0'
2£l
n
"^X'Xi
•n '
)
Also,
dm{(jj, 0^)
da^
— trace (
= trace (
= trace (
p + 3
d'L{e;y)
da^dode'
i(X'X)
ih"))
0'
^I
0' 4
2n
a2(X'X)i
0' ^
n
(4.66)
(4.67)
Hence, all but the last element of s^ is zero:
tJ/y
123
4.7.2.3 Directional derivative
Using the results above and A from Section 2.5, we may finally construct the
directional derivative of the log of the perturbed generalized variance
S={8l + s'KJ)v
= {8l + s'K{L)A)v
K +
= « +
0' 2+3
Q, {p+Z)a^
a2(X'X)i
0'
251
n
A)v
j,X'D{r)
2&* sq
)V
([D(Xi) ... D(Xp_i)][Vi®^]vec(A'X)i
\T..^
^n' sq ' ~ 9
p + 3 ,
rsq)v
= {[D{X,) ... D(X,_0][Vi®^]vec(X'X)^ + ^r,,)V
Hence, the direction of the maximum absolute slope is
(4.68)
Umax oc [ D(Xi) . . . r>(Xp_i) ] [Ipi ® X] vec(X'X)^ + ^^r,,. (4.69)
4.7.2.4 Simple random sample
As a special case, consider a normally distributed simple random sample
(SRS). The direction of maximum slope simplifies to
1 3
Vmax OC m = £>(!„) [1 ® 1„] vec() + ;jT^r«,
n na'
oc
fsq „ In
(4.70)
Hence, observations with squared residuals that deviate largely from ^ have large
elements in Umax This may seem odd, but recall that this represents influence on
an estimated joint asymptotic confidence region for /i^^ and a^. If the log of the
124
Figure 4.1: Hills data; Vmax for perturbed log generalized variance of
variance of these two estimates is considered separately, the directions of maximum
absolute slope are respectively given by
»^niax CX T
sq
a^r
^max 0^ '
sq
The direction of maximum slope for (1^ is dominated by squared residuals that
deviate greatly from the average squared residual, a^, while Umax for a^ is dominated
by large squared residuals. The direction of maximum slope for the joint influence
measure is simply a weighted average of these two vectors.
4.7.2.5 Examples
As examples, we consider the Scottish hills data and the school data. For
the hills data, the element of Umax for observation 1 1 is large and the element for
observation 18 is huge (Figure 4.1). For the school data, elements 3 and 18 show
substantial influence on the perturbed generalized variance (Figure 4.2).
125
Figure 4.2: School data; fmax for perturbed log generalized variance 9
4.7.2.6 A simpler measure
The directional derivative simplifies greatly if the measure is based on the
unperturbed generalized variance evaluated at the perturbed estimates:
m = log
' ae' '
In this situation, s^ vanishes, but the other derivations follow from above, yielding
the following results for the variance perturbation:
c P + 3 ,
na
2 »9
''max
p + 3
''^max CX T
sq
4.8 Local Assessment for Quasilikelihood Inference
In this section, local influence assessment of quasilikelihood estimates
(Wedderburn 1974) for independent observations is outlined. A thorough treatment
of quasilikelihood inference can be found in McCullagh and Nelder (1989). Here, we
present only some fundamental concepts, viewed from the vantage point of optimal
estimating equations (Godambe and Heyde 1987; Desmond 1997a). Some local
influence techniques are then outlined.
126
4.8.1 Quasilikelihood ...
Consider a model in which a set of n independent observations y depends
on a set of p parameters through their means ti{0). Although full distributional
assumptions are not made, we also assume that the variances of y depend on their
means: V[j/] = a'^V{n). A class of estimating functions that we may consider for
estimating are given by
h*i0;y) = W{0){yfi{0))y, (4.71)
where the p x n weight matrix W{0) can be a function of the unknown parameters.
Standardizing (4.71) as in (4.3) yields (Wedderburn 1974; Godambe and Heyde
1987):
h{e; y) = %V{t^)\y  t^m/<^' (4.72)
These optimal estimating functions are called the quasiscore functions because
the solutions to h{0;y) = maximize the following "quasi" likelihood function,
provided the integral exists: •
4.8.2 Local Assessment
Some preliminary results for local assessment of quasilikelihood models for
independent data are presented in this section. Assumptions about the model's
form are presented, some perturbation schemes are proposed, and a caseweight
perturbation is applied to an example data set.
4.8.2.1 Assumptions
It is assumed that observations are independent, and that the variance is
related to the mean by the function V(/i). In addition, it is assumed that the mean
127
is related to a linear predictor 77 = x'fi through a link function, g:
gilJ') = V
4.8.2.2 Perturbation Schemes
One perturbation is to introduce case weights, or equivalently, variance
perturbations. A set of n perturbations can be introduced into the estimating
functions in a multiplicative manner:
hU0;y) = ^D{u:)V{fj,)'{yt^{e))/a\
Writing the j'*'' perturbed estimating function in summation notation shows how the
perturbations weight the contributions of the individual observations:
The matrix A is then given by
dhUe;y)
A =
d(jj'
1 ^^^'.DiV^)
a^dO 'V{^i)
where D{^^^) is a diagonal matrix with i*^ diagonal element {xji  Hi)/V{fii).
If the response is continuous, then a perturbation can be made to assess
additive errors in measurement. The perturbed estimating functions are then
KiO;y)
'de
V{n)^y + u;n{0))y,
which leads to
1 dK{ey) _ dfi'
~ dO
du}'
v{n)
Other perturbations can also be introduced, including errors in the explanatory
variables and perturbations to the linear predictor.
. , 128
4.8.2.3 Example: Caseweights Applied to Leaf Blotch Data
As an example, we apply the caseweight perturbation to a data set
concerning leaf blotch (Wedderburn 1974; McCullagh and Nelder 1989). The
percentage of leaf area affected by the blotch was recorded for 10 varieties of barley
grown at 9 sites. Although the response is continuous, it can be analyzed as if arising
from binomial sampling using the logit link function and a main effects model:
V{fii) = Hi{l  fjLi) gifii) = log — ^— = T]i.
Plotting the Pearson residuals from this model against the linear predictor indicates
that the model overestimates the variance for the largest and smallest proportions
(Figure 4.3). Applying the case weight perturbation requires h and the following
matrix:  ,. . , ' ,•
A = ^X'D{yii),
where X is the model matrix and D{y — fi) is a diagonal matrix of the raw residuals.
To measure the effects of perturbation, we use the unperturbed Pearson Chisquared
statistic evaluated at the perturbed MLEs:
m
= E
This implies that the final building block for a firstorder assesment of this measure
is given by
dm
^'' 80
= X'z,,
where the n x 1 vector Zi has z*'' element
2y,/if + yf(l2/i,)/^^
Ai(iAi)
129
The direction of maximum local influence is then
Vraay.CC D{y ij,)Xh X' Zi
Plotting the elements of Vmax against case indices indicates that elements 24 (site
3 variety 4) and 65 (site 7 variety 5) are influential (Figure 4.4). When plotting
elements against the corresponding linear predictor, it is clear that observations
with moderatelysized proportions are more influential (Figure 4.5).
As in Wedderburn (1974) and McCullagh and Nelder (1989), we consider
fitting a model with an alternate variance function, V{ijl) = /i^(l — /x)^. This function
reflects the assumption that extreme proportions are more precise than those arising
from a binomial distribution. Pearson residuals indicate that the alternate variance
function is more realistic, but that a relationship between the residual variance
and the the linear predictor remains (Figure 4.6). Implementing the case weight
perturbation and assessing its effects on the Pearson Chisquare statistic requires
the following quantities: ■: ^ ,
^' ''■ '■'  "'' .■ ■;
02 /i(l  /x)
and
dm
X'Z2,
where the n x 1 vector Z2 has i*'' element
2{fif  SfjfVi + ili{2yf + yj)  yf)
The direction of maximum local influence is then
t^max OC D{^^—^)Xh ^X'Z2.
/i(l/x)
130
■H 1
n
01
u
Vt^
TT'
• •«
% •
8642 2
h
Figure 4.3: Leaf blotch data; Model I; Pearson residuals vs. linear predictor
A plot of Umax indicates that only observation 24 is influential for this model (4.7).
The elements have less association with the linear predictor than the binomial model
(Figure 4.8), indicating that the contribution of the observations to inference is more
appropriate in the second model.
rv<
131
1
0.75
0.5
0.25
0.25
0.5
0.75
**<* « * V. * • .i f*^ V*^^^^^ ' "V * «. **' ; **' » J
20
40 60
Case
80
Figure 4.4: Leaf blotch data; Model I; Vmax The first ten indices correspond to site
1, varieties 110, while the second ten indices correpsond to site 2, varieties 110 etc.
The dotted vertical lines are present to help differentiate sites and varieties within
sites.
0.6
0.4
S 0.2
I
0.2
—I— I— I— I— I—
«> i. 'J . '^. .v. a*
> *" *«■ »': ' '., v.. A'» «v^
»•
6 42 2
h
Figure 4.5: Leaf blotch data; Model I; Umax vs. linear predictor
132
.A^'^^'
2
■
•
2
' .*■ •
.■S 1
01
«
u
•
•
• •
•
* • •
• • • •
•
■
• .rf'V •.;
1 •
1
1
• I
•
•
H ;> • * .
■
1
•
4 2
h
Figure 4.6: Leaf blotch data; Model II; Pearson residuals vs. linear predictor
1
0.75
0.5
0.25
0.25
0.5
0.75
. /^
•".
•
•
•
•
•
•
•
•
•
•
^'
.%•••
. . .
• •
•
< H.O
• •
• •
•
20
40
Case
60
80
Figure 4.7: Leaf blotch data; Model II; Umax The first ten indices correspond to site
1, varieties 110, while the second ten indices correspond to site 2, varieties 110 etc.
The dotted vertical lines are present to help differentiate sites and varieties within
sites.
0.6
0.4
S 0.2
0.2
133
•. • •
•,'•
8 6 4 2
h
Figure 4.8: Leaf blotch data; Model II; Umax vs. linear predictor
CHAPTER 5
LOCAL ASSESSMENT OF PREDICTIONS
In this chapter, local assessment in prediction problems is considered. First,
prediction is motivated by two methods: first principles and predictive likelihood.
This is followed by a brief discussion of influence measures for estimation and
prediction. Then, predictive inference and existing local influence techniques for
linear mixed models are reviewed. These techniques are then extended to assess the
effects of perturbations on predictions, and three perturbations schemes are applied
to an example data set.
5.1 Prediction via First Principles
Consider the problem of predicting a vector of random variables, u.
Among predictors u*, u — E(m) minimizes the mean squarederror of prediction,
MSEP(m*) = E(m — w*)^. The standard error associated with this prediction
is y/MSEP(w) = y/Y{u). However, prediction is typically not so simple; it is
usually performed using information from a sample and in the presence of unknown
parameters.
5.1.1 Prediction After Data are Observed ?
Suppose now that prediction of u takes place after observing data y. If
the two random vectors are correlated, then an improved inference can be made
about u. Here, the conditional mean, ti = E(wy), can serve as a point prediction.
This prediction minimizes the conditional mean square error of prediction,
CMSEP(m*) = E(u  u*\yy. Since it minimizes the CMSEP for given y, it also .
minimizes the (marginal) MSEP. This best predictor, ii, differs from u in that it is a
random variable, rather than a constant. Also, the best predictor's MSEP can be no
134
135
larger than that of it:
MSEP(m) = V(w)
= E(V(My))+V(E(«?/))
>E{V{u\y))
(5.1)
= E{E{{uu)'\y))
= E(CMSEP(u))
= MSEP(w).
Other predictors can be derived if the class of predictors is restricted. For
instance, minimizing MSEP among predictors that are linear functions of y results
in the best linear predictor (Searle et al. 1992):
BLP{u)^E{u) + CV\yE{y)), (5.2)
where V is the variance of y and C is the covariance of u and y. Although
not required by its definition, the BLP is an unbiased predictor of u since
E{BLP{u)) = E{u).
5.1.2 Prediction After Data Are Observed, but with Unknown Parameters
Typically the goal of prediction is accompanied by a need to estimate
unknown parameters 0. Here, we may first estimate the parameters using the
observed data and then estimate the best predictor. The MSEP of the estimated
best predictor can then serve as a measure of uncertainty in the prediction, although
Booth and Hobert (1998) argue that CMSEP is more relevant.
To illustrate, consider the normal linear regression model with unknown
parameters, /3, but with known variance, a^. After observing data y, the goal
is to predict future independent observations u ~ N{Z^,a^Im). From either
ordinary least squares or ML, = (X'X)'^X'y. Hence, the estimate of
136
E{u\y) = E(m) is Z^, and its mean squared error of prediction can be shown to be
a^{Z'{X'X)~^Z + Im) Also, Z^ is normally distributed. These facts can be used
to create a 100(1 — a)% prediction interval for a single it with covariate vector z:
z'3 ± Za/2 ^J<r^z'iX'X)'z + l). (5.3)
In the event that a^ is unknown, the prediction interval given in (5.3) is
still valid but cannot be calculated. One option is to simply replace a"^ by s^ to
produce an approximate prediction interval. A better alternative is to make use of
the following pivotal quantity:
u  z'^
tnfc,
(5.4)
y/s^{z'{X'X)'z + l)
where k is the number of regressors. An exact 100(1 — a)% prediction interval is
then
z'0 ± t„^k,a/2\JsHz'{X'X)^z + 1). (5.5)
This simple situation highlights the fact that unknown parameters can
complicate the prediction problem. While pivotal quantities may be available in
simple situations, accomodating unknown parameters may be more cumbersome
in complicated models. Therefore, attempts have been made to unify predictive
inference via predictive likelihood.
5.2 Predictive Likelihood
Predictive likelihood is a likelihoodbased framework for handling the
problem of prediction. The goal is to predict w via a point or interval predictor in
the presence of unknown parameters 0. If u is treated as a vector of parameters,
then the joint likelihood of u and is
L*{u,9y) = f{y,u\e) = h{u\e,y)h{y\e). (5.6)
137
A predictive likelihood for u would result by somehow eliminating from the above
expression. For Bayesians, this can be achieved in a straightforward manner. The
posterior density of given the data y is obtained, and then (5.6) is integrated over
the density of 0. The resulting expression is a posterior predictive likelihood for u.
For frequentists there are several competing options, as outlined in Butler (1986)
and Bj0rnstad (1990). Here we consider two: the estimative predictive likelihood
and the profile predictive likelihood.
5.2.1 Estimative and Profile Predictive Likelihood
The estimative/empirical predictive likelihood replaces in (5.6) by 0, its
MLE. The resulting predictive likelihood is actually proportional to the density of u
given y with the MLE inserted in place of 0. This is considered a naive approach, as
resulting inferences for u do not take into account the uncertainty in the estimated
parameters. An alternative frequentist predictive likelihood can be motivated by
profile likelihood. The idea works as follows. If is considered a set of nuisance
parameters, then a technique for estimation is to first maximize the joint likelihood
of {u,0) with respect to for fixed u. The resulting solution, 0{y,u), is then
substituted into the likelihood. The resulting expression is a function of only u and
y, and is known as the profile predictive likelihood of u (Mathiasen 1979; Lejeune
and Faulkenberry 1982; Levy and Perng 1984).
5.2.2 Point and Interval Predictions
Regardless of which predictive likelihood is chosen, point and interval
predictions are of interest. One option for a point prediction is to maximize the
predictive likelihood, £{u; y) with respect to u. However, the resulting maximum
likelihood predictor can have poor properties (see Bj0rnstad, 1990, examples 4 and
7) . An alternative can be motivated by standardizing the predictive likelihood to be
a predictive density for u: f{u\y) = £{u; y) / f2{y) . The mean of this conditional
density, the predictive mean, is the preferred point predictor. Standard errors and
prediction intervals can also be constructed from this predictive density.
5.2.2.1 Example
Consider again the situation of predicting future observations in a linear
regression model with 0' = {0',(t^) unknown. Naively replacing by in the
conditional distribution of u given y yields the estimative predictive likelihood.
Hence, inference proceeds under the assumption that u is conditionally distributed
as normal with mean Z^ and variance a'^Im A prediction interval for a single u
with covariate vector z would then be
In contrast, the conditional distribution of u from standardizing the profile
predictive likelihood is multivariate t with mean Z0 and dispersion parameter
na^{I + Z'{X'X)~'^Z), as shown by Levy and Perng (1984). A prediction interval
for a single u would then be
z'^±tr,pa^Jl + z'{X'X)'z.
This matches the exact prediction interval given in (5.5).
5.3 Influence Measures for Prediction
In this section, some general measures are proposed for assessing the effects of
perturbation on point estimation and point prediction. Notation is as follows: $ are
point estimates for parameters and u are point predictions for random variables
u. Also, let ^^ and u^ denote the perturbed versions of these point inferences, and
let PL{u;y,0) denote the log of the unstandardized predictive likelihood.
139
5.3.1 Predictive Likelihood Displacement
An overall measure of the change in estimates and predictions u due to
perturbation is the predictive likelihood displacement:
PLDiLj) = 2[PL{u; y, ^)  PL{u^ y, 0J]. (5.7)
A local assessment of this influence measure could be first or secondorder,
depending on the model and the method of inference.
Thomas and Cook (1990) considered influence on predictions in generalized
linear models (GLMs). To explain, let u denote a single future observation that
is independent of the observed data. Also, let f{\0) denote the density function
shared by the observed data and the future observation. Thomas and Cook (1990)
proposed the following influence measure for assessing the eflfects of perturbations
on predicting u: > , t /
d*{u:) = \og f{u\0) log f{u\0J. (5.8)
In the context of the article, parameters were estimated by ML and point predictions
were u = E{u\y,i3). For ML estimation of a GLM, d*{u}) is nonnegative and
minimized at the null perturbation, indicating that a secondorder local assessment
is appropriate. However, in other situations it can be negative.
The influence measure d*{u}) is motivated by a deviance residual for the i*''
observation (McCuUagh and Nelder 1989):
^ogfivM = Vi)  log f{yi\ni = fii0)), (5.9)
where E{yi) — /Xj and fli is the estimated mean of the observation based on the
model. In the context of GLMs, each deviance residual is nonnegative, with smaller
values indicating a better fit. The deviance of a model is defined as the sum of the n
deviance residuals and serves as measure of overall goodnessoffit.
140
The influence measure d*{u}) can be related to predictive inference in the
following way. First, define the following modified predictive likelihood displacement:
PLD*{u;) = 2[PL{u;y,^)  PL{u;y,0j]. (5.10)
This differs from PLD{u}) in that the prediction for u is not perturbed. Also,
assuming y and u share the same density and that all elements of y and u are
independent, we have
m m
PLD*{u;) = 2[log/(y^) + ^log/(n,^)  log/(yi;9J  ^log/(n,;9J]
t=l i=l
m
 2[log/(y;a)  log/(y^J] + 2^(log/(«,^)  log/(«,^J) (5.11)
i=l
m ' •, .
1=1
where d*{u}) is the deviance residual of perturbation for the i*'' prediction. Note that
the prediction for u is the same in both terms of rf*(w). Technically then, d*(uj) is a
measure of the pertubation's effects on the estimated conditional mean of u rather
than a change in the prediction for u.
We note that under the same assumptions given above, the predictive
likelihood displacement (5.7) can be written as
m
PLD{u}) = LD[(jj) + 2 ^ di[u}),
i=l
where
di{l^) =\og f{Ui\'p) log f{UiS^ J.
5.4 Estimation and Prediction in Linear Mixed Models
In this section, the linear mixed model is presented with point estimates and
point predictions motivated in three ways. Then, local influence techniques for the
141
linear mixed model are reviewed, and additional tools for assessing predictions are
presented.
5.4.1 Linear Mixed Models
The linear mixed model (LMM) is formulated as follows:
y = X)9 + zu + €,
(5.12)
where
u
5
G
e
\
R
y is an n X 1 vector of observations, X is a known n x pi matrix, Z is a known
n X p2 matrix, ;9 is a pi x 1 vector of unknown fixed effects, u is a p2 x 1 vector of
unknown random effects and e is a n x 1 vector of unknown residual errors. The
matrices G and R are of order p2 and n respectively.
Littell et al. (1996) provide a host of applications for this rich class of models.
When there are no random effects and R = a"^!, this model simplifies to the fixed
effects regression model (1.10). The inclusion of random effects and the general
residual variance matrix R permits the modeling of correlated data. This is evident
by looking at the marginal distribution of the data under model (5.12):
% . «> *r> '
■I 'r**
■J *
Y ^N{X^,ZGZ' + R).
(5.13)
The unknowns in (5.12) are the fixed effects 0, the random effects u, and the
elements of the covariance matrices G and R. In this chapter, we will concentrate
on assessing the effects of perturbations on inferences for and u. Point estimates
of and point predictions of u can be motivated by three methods: first principles,
mixed model equations, and profile predictive likelihood. See Searle et al. (1992) for
information on estimating G and R when these matrices are unknown.
142
5.4.2 Inference by First Principles
The best linear unbiased estimates (BLUEs) of are the generalized least
squares estimates:
V ,: p = {X'V'X)'X'V'y, (5.14)
where V = ZGZ' + R. Here, "best" refers to minimum meansquared error of
estimation. The estimates are equivalent to the MLEs derived from the marginal
distribution of V .
The best linear unbiased predictors (BLUPs) for u are given by
u = GZ'V^c, (5.15)
where c = y — X/3 is the vector of generalized least squares or composite residuals,
the latter term being coined by HildenMinton (1995). Here, "best" refers to
minimum MSEP.
Note that the BLUEs and BLUPs do not require normality only the model
and covariance structure specified in (5.12). Detailed derivations can be found in
Searle (1971) and Searle et al. (1992).
5.4.3 Inference by the Mixed Model Equations
Henderson et al. (1959) derived estimates for the LMM by maximizing the
joint loglikelihood of u and /3:
(5.16)
(x[{yXp Zu)' R^ iyX0~ Zu) + u'G^u] .
This likelihood is also known as an hlikelihood (Lee and Nelder 1996). The solutions
that maximimize (5.16) are in fact the same as the BLUEs and BLUPs. The
resulting estimating equations are called the mixed model equations (MMEs) and
143
are usually written
V
X'R^X X'R^Z
Z'R^X Z'R^Z + G^
The corresponding set of estimating functions are
X'R'y
Z'R'y
(5.17)
h{u,l3;y) =
X'R\yX0Zu)
Z'R\y  X/3  Zu)  G^u
(5.18)
5.4.4 Inference by Profile Predictive Likelihood
The BLUEs and BLUPs can also be derived by the profile predictive
likelihood. Recall that inference using the profile predictive likelihood works as
follows:
1. Maximize the joint likelihood of u and ^ with respect to 0. Denote
 the solution as /3(j/, u).
2. Substitute ;9(2/,w) into the joint likelihood. The resulting expression,
L*{u,^{y,u);y), is the profile predictive likelihood of «.
3. Obtain a prediction for u by either of the following methods:
a. Obtain the maximum likelihood predictor (MLP) by max
imimizing L*{u, ^{y, w); y) with respect to u.
h. Obtain the conditional/predictive mean of u given (y', ^ (y, u))
horn fi{u\y,0{y,u))ccL*{u,0{y,u);y).
4. Obtain point estimates of /3 by substituting the predictor of u from
step three into ${y, u).
By definition, the point inferences from steps 3a and 4 maximize the joint
likelihood of u and (3. This implies that using the MLP from the profile predictive
likelihood leads to the BLUEs and BLUPs. However, it is not as clear that the
solutions from step 3b and 4 lead to the same solutions, and so the derivation
144
follows. The first step is to maximize the joint likelihood of u and with respect
to /3 for given u. This leads to the first set of mixed model equations (5.17), which
have solution ^{y,u) = {X' R~^ X)~^ X' R'^ {y  Zu). The next step is to show
that predicting u with its conditional mean given the data and the estimate /3(j/, u)
leads to the second set of mixed model equations. Starting with the profile predictive
likelihood, we have
L*{u,0{y,u);y)
ocexp(^{y X^{y, u)  Zu^ ' R' (y  X^{y, u)  Zu)  lu'G'u)
^exp(^y'R'(^yX0{y,u)Zu)
+ ^^\y,u)X'R'(yX0{y,u)Zuj
+ ^u'Z'R' {y  X^{y, u)  Zu)  ^WG'u]
= exp (^  ^y'R' (y  X0{y, u)  Zu)
+ ^u'Z'R' (j/  X^iy, u)  Zu)  lu'G'u) ,
(5.19)
145
''ol/
since X'R (y — X^{y, u) — Zu) = 0. Examining the exponent, we find
\(y'R\y  XP{y, u)  Zu)  u'Z'R\y  X^{y, u)  Zu) +
u'G^uj
Uy'R'y  y'R~'X^{y, u)  2u'Z'R'y + u'Z'R'XP{y, u)
+ u'Z'R^Zu + u'G'^u
' .' D— 1 'v/' v' D— 1 ■v^l "V' E>— 1(
oc  (  y'R'X{X'R^X)^X'R\y  Zu)
2u'Z'R^y + u'Z'R^X{X'R^X)^X'R\yZu)
+ u'Z'R^Zu + u'G~^u
(xU 2u'{Z'R^  Z'R\X{X'R^X)^X'R^)y
+ u'{Z'R^X{X'R^X)^X'R^Z + Z'R^Z + G^)u
= Uu'{Z'{I  R^X{X'R'X)^X')R^Z + G^)u
 2u'{Z'{I  R^X{X'R^X)^X'))R^y
= i (u'{Z'P'R^Z + G^)u  2u'Z'P'R'y\ ,
(5.20)
where P' = I  R^X{X'R^X)^X'. It follows that
f{u\y, ${y, u)) oc exp (^  ^{u' {Z' P R^ Z + G~'')u  2u'Z'P'R'y)\ (5.21)
This is proportional to the kernel of a normal distribution with a mean of
(ZP'R'Z + G ') 'Z'P'R'y, (5.22)
and this predictive mean is the point prediction for it.
146
Meanwhile, substituting the solution from the first set of MMEs into the
second set yields
Z'R^X{X'R^X)^X'R^{y  Zu) + {Z'R^Z + G'^)u = Z'R^y. (5.23)
Solving for u provides the BLUP of the random effects,
u = {Z'P'R^Z + G^)^Z'P'R^y,
and this is identical to the predictive mean (5.22). Hence, predicting u with the
predictive mean from the profile predictive likelihood yields point inferences for «
and /3 that are identical to those from the mixed model equations.
5.5 Local Influence Methods for Linear Mixed Models
5.5.1 Assessment in the Marginal Model . ,_
Local influence techniques for linear mixed models have been developed using
the marginal model (5.13). Beckman et al. (1987) considered a special case of the
linear mixed model known as the variance components model. This model is used
for data with a sources of grouping or clustering, where the i*'' source has gi levels.
For Zi being of size n x g^ and Wj being of size 5', x 1, the model is given by
y = X0 + ZiUi + Z2U2 + ...ZaUa + e. (5.24)
This structure implies Z is a blockdiagonal matrix with i"* block Zi, and u has a
total of p2 = Yli=i 9i elements. In addition to the general LMM assumptions, the Wj
vectors are mutually independent, with V(ui) = aflg. and V(c) = cr^J„. The model
implies that the marginal variance of Y can be written as
 '  ■ "•
V = ZGZ' + R=J^afZiZ[ + a^I.
1=1
147
Beckman et al. (1987) assumed the fixed effects as well as the variance parameters,
{a^,al, . . . ,(Tq), are jointly estimated by ML. Two perturbation schemes were
introduced. The first scheme perturbs the z*'* residual variance to Wjcr^, while the
second scheme introduces a perturbation to the variance of each random effect: the
variance of the j*'' element of Mj becomes uijaf. While both of these perturbations
technically perturb the model assumptions, the major goal is to assess the influence
of individual observations and clusters, respectively.
Lesaffre and Verbeke (1998) emphasized applications to longitudinal data.
They considered the following linear mixed model:
Vi = Xi/3 + ZiUi + Ei (5.25)
where
i = l,...g,
yi is vector of nj observations,
Xi is an nj X p matrix ,
Zi is a,n rii X t matrix,
Ui is a vector of t random effects and
Cj is vector of rii errors.
It is assumed that V{Ui) = Gj, V{€i) = /„,, and that y^ and y^, are independent for
i ^ i' . The model permits the inclusion of random slopes, and the matrix Gj can be
as general as needed. Also note that the model has a total of p2 = gt random effects.
In the context considered by Lesaffre and Verbeke (1998), the subscript i
corresponds to the i*^ subject, from whom repeated measurements are taken over
time. The model implies that the loglikelihood is the sum of contributions from each
of the subjects, and by allowing each of the m weights to vary from 1, a subjectlevel
148
case weight was introduced as a perturbation. Because the perturbation does not
correspond to a new density for the data, it would be considered a perturbation of
the estimation technique. Diagnostics were primarily based on basic perturbations
(i.e., oneatatime perturbations) of subject weights. Lesaffre and Verbeke (1998)
showed that the basic curvatures could be written as the sum of several terms, and
the easilyinterpretable terms were used as a means to compare the relative influence
of subjects.
5.5.2 Assessment of Predictions
The influence of perturbations on both the BLUEs and BLUPs can be
assessed by applying the techniques developed in Chapter 4 to the predictive
HkeUhood displacement (5.7). The estimating functions, h{0,y), are those used
in the mixed model equations (5.18), and if the perturbed model is also a linear
mixed model, the perturbed estimating functions, hi^{0;y), are also of this form.
Local assessment of the predictive likelihood displacement requires the following
quantities:
e =
13
u
i{e^) = e^
m = PLD{u,) = 2[PL{u; y, ^)  PL{u^; y, ^ J].
The next step is to determine whether first or secondorder local assessment is
appropriate. Since the influence measure does not explicitly depend on a;, s^^ = 0.
Also, by derivation of the mixed model equations.
8^
dPLD{u})
djiej
d
30^
{2PL{u^;y,0J)
= 2
dL{e;y)
de
= 0.
149
The implication is that the gradient vector vanishes. Proceeding with a secondorder
assessment, we note
S =
dwdui' 0 dwdyiey 1°
d'y{0)dw' l0 d'y{0)d'y(ey I"
5
77
iif
57(^<.)
a©,.
Using (4.32), the acceleration matrix is then
/, JiiT'
■*9
(5.26)
= 2A'h"
.52L(0;j,)
8080'
h A
2A'h A.
where h = 8h{u,/3;y)/d0'\o. The matrix h is derived as follows. Taking derivatives
of the estimating functions with respect to we find:
8h{u,/3;y)
80'
d_
80'
X'R\yX/3Zu)
Z'R\y X^ Zu)  G'u
X'R^X X'R^Z
Z'R^X Z'R^Z + G^
(5.27)
The expression is not a function 0, so (5.27) is in fact h. It can be shown by
inverting this matrix that the variance matrix of the BLUEs and BLUPs is (Searle
et al. 1992)
where
h
150
(5.28)
h^^ = {x'v^xy^
h?^ = GZ'V'Xh^^
/i22 = {Z'R^Z + G^)i  h'^^X'V'^ZG.
Once a perturbation is chosen and the perturbed estimating functions are
constructed, computation of the acceleration matrix (5.26) is straightforward.
5.5.3 Assessment for Subsets of Point Inferences
Influence measures for subsets of the perturbed point inferences can also be
developed. If only the fixed effects are of interest, then assessment can be done
using the marginal model (Beckman et al. 1987; Lesaffre and Verbeke 1998). More
generally, a profile predictive likelihood displacement can be defined to assess the
change in a subset of fixed and random effect point inferences, 6^ = {^i,u[):
PLD^''\^) = 2[PL{uy,P)  PL{u,^,gu{K)yA.,9p{0i.))l (5.29)
where ^j,^ = Cfii^,u\J comes from /9^ = {^[^,^'2^' and u^ = (wio/'^L)' and
i9'j3{^ii^),9'u{0icj)) maximimizes PL{ui^,U2;y,^i^,02) with respect to {/3'2,u'2). For
linear mixed models, the acceleration matrix follows from the arguments for deriving
the acceleration matrix of LD^^^^{u}) in Section 4.6.4, and is given by
M = 2A'{h 
1
h
22
)A,
151
where /122 comes from the partition
h
hii hi2
h2\ /122
When influence on a linear combination, x'^ + z'ii, is of interest, the results of
Section 2.7.1 can be used. Letting x*' = {x',z'), it follows from the argument of
Theorem 2.1 that . .
«f
Cm^ = ^A'/l"'2 t;:nax OC A'h'x*, (5.30)
where ki = h x*.
5.6 Oneway Random Effects Model
The oneway random effects model can be used to provide insight into the
diagnostic value of locally assessing the linear mixed model. Some preliminary
results for this special case of the LMM are presented here, so as not to clutter later
explanations.
The (unbalanced) oneway random effects model is
Vij  fi + Ui + eij i^l,...g j = 1,... m, (5.31)
with V(itj) = a^ and V(eij) = a^. It is a special case of the variance components
model (5.24) with a = 1, X = 1„, and /3 = jx, and is also a special case of the
twolevel model given in (5.25). If all n, are equal to n/g, the design is said to be
balanced. It can be shown that
^= y^ 1  a^d u,= —^c, (5.32)
where y, and Cj are the average response and average composite residual, respectively,
for the z*'* cluster (Searle et al. 1992).
The building blocks for deriving h are as follows:
X = lr
z =
•■m
R = a^In G = allg
X'X = n Z'Z = Dim)
X'Z = n' = {ni,n2,...ng). , '
ln„
152
It follows that
h
X'R^X X'R^Z
Z'R^X Z'R^Z + G^
iTl
rr*
^n ^Dini) + ^Ig
n n
n D{
a^+n,aj
(5.33)
The following results for a partitioned matrix with a scaler upperleft submatrix
can be used to invert the matrix (Searle 1982):
A B
C D
1 BD^
+
D^
(5.34)
153
where c = A BD ^C. Applying this result, we find
h
a
~k
1
1 1
'^i+
c ' ni
<^c^ +
^i + z
atM{n„nj) + kD{^f^^)
(5.35)
where
M{ni,nj)
1 1
and
y 9 y 9 ■
■^a^ + ^Vn, ^a^ + aVni
2
When the design is balanced, A; = ^2/ +g2/n ) and the following simplifications can be
made:
/* = (^c/^ + <^V^)
1
al/g + ayn
^1
5 »
'r'r/9
—V
In^g
11 \ 2
1 /^ ''c/n T
9 "^ '^allg+cj^/n^g
allg+ayn) ^9X
7" ^K
' g 9
"l
'^^9^9 + n '^S
o?/S+o'^/n
.< *
r: "^
Of particular interest are the first and second columns of — /i
1
—h di = p
a
k
(T^+CT^/m
cr^+a^/rig
h d2
a'al
k{a^,+ayn,)
1
+
cf+a^Jni ni
(t +02/712
c^+a'^/ng
154
(5.36)
(5.37)
5.7 Residual Variance Perturbation Scheme
In this section, a residual variance perturbation is proposed. The perturbation
is applicable to the general form of the linear mixed model. Simplifications when
R = a^I are presented, and some closedform expressions for the oneway model are
given.
5.7.1 Form of the Perturbation
Let aij denote the if^ element of the matrix R. A set of n perturbations is
introduced into the model such that the perturbed residual covariance matrix is
Rw —
ai2
V'a)iW2
X/W1W3
y/(jJltjJ2
"it
W2
013
(Tin
02n
03„
■s/uliiiln
0^3n
\/OJlCO„ y/cJ2Uln s/UJ3U]n
The intent of this scheme is to weight the contribution of individual observations
to inference in situations when they do not necessarilly enter the likelihood in an
155
additive fashion. The perturbation changes the variances but leaves the correlations
among the error terms unchanged. The resulting local influence diagnostics can
detect influential observations without upsetting the relationships assumed to exist
among the residual errors.
It simplifies to the error variance perturbation given for regression when
R = a^I.
5.7.2 Computational Building Blocks 
To derive A = dK{0]y)/duj'\o, define e = yX^Zu, and X* = [X Z].
We begin with the perturbed joint likelihood. Ignoring terms that do not depend on
both and u? we have
L^{u,0;y) = log(/(y  0,u)g{u  0))
^l[{yx*erRzHyx*e)]
= ^ [y'RJy  20'X*'R'y + e'X*'Rz'X*0] .
Taking derivatives with respect to 0, we find
dL^{u,0;y)
Kie;y)= ,,
= ~ [2X*'R'y + 2X*'RZ'X*0]
= X*'Rz'{yX*0).
Second partial derivatives are given by
156
Evaluating this at the original model we obtain the i*'* column of A:
*/ Dl
Ai = X*'R
2<7li 2*^21
\aii ...
)^(i+l)i
2^ni
...
R^e.
(5.38)
When H = a"^!, R^ = a'^D{l/u}), and the expression above simplifies, allowing us
to write
A = ^X*'D{e)
and
M
2A'L ^A = ^D(e) [X Z]
T 1 r
X'X X'Z
Z'X Z'Z + a^G^
X'
z'
D{e).
5.7.3 Oneway Random Effects Model
Deriving results for the oneway random effects model in more detail should
provide information about the pertubation's diagnostic value. First, consider
influence on the intercept parameter estimate, /i. This can be accomplished by
using either m = fi^ or m — PLD^^\u}) as the influence function. The direction of
maximum local influence for both measures is given by
157
Vmax cx: A'/l di
ex D{e) [X Z]
(T^+a^/m
t2I/t2/
<T+<T2/ng
oc£>(e)
■•ni
(x£>(e)
(•'• (72+o'2/ni)'"i
oc
^Vm
ei
CT2+02/ni
aVng
Cfl _
a2+^2/"3
cr +02/711
(5.39)
where Cj denotes the residuals for cluster i. Thus, the direction of largest local
influence is proportional to the residuals, but with each cluster scaled according to
its size. The coefficients upweight observations in smaller clusters and downweight
observations in larger clusters. The reason for this is that ft is not a simple average
of the observations: the numerator of /t in (5.32) can be written as
9 ni
EE
Vij
rual + <t2
1=1 ]=i ^
This indicates that observations from smaller clusters receive more weight. Note
that if either the design is balanced or al — 0, then Umax oc e.
158
We now consider influence on the predicted random effects. Without loss of
generality, let us examine influence on Ui . The direction of largest local influence is
given by
Vr,
DC A'h di
oc D{e) [X Z\
+
(T^+iT^/ni ni
al+a"^ In2
al+a'^lng
Die)
>ni
+
o^+cr^/ni ni
Die)
1 +
+
1 +
1 +
o^+iT^/ni ni
^2
a'l+a'^ In2
of+c^Jng
_k o^/ni
<T2+aVn2^2
)ei
cj+a^/m
t2I/i2 /
''l+aVns
(5.40)
159
Although the individual cluster sizes play into the weights, it is clear from inspection
that the residuals from the first cluster have weights roughly g — 1 times greater
than the others. In the balanced case, all rii — n/g, and this implies
y^ aVm _ {g l)crV"
^ al + crVn, a'^/n + al/g '
It follows that
^max Ot
1
■91
62
si^s J
Thus, in the balanced case, the residuals of the first cluster have weights that are
g —\ times larger than the others. In fact, the BLUP of U\ is given by
'ylh
o'^jn + allg
51
1 ^
)z/i   Xl y^
1=2
Not surprisingly, maximum influence on Ui is achieved primarily by perturbing the
variance of observations in cluster i.
5.8 Cluster Variance Perturbation Schemes
In this section, two clusterbased perturbation schemes are considered for
assessing how clusters of observations influence estimation and prediction.
5.8.1 A Clusterbased Case weight Perturbation
For simplicity, we consider the model given in Lesaff're and Verbeke (1998),
(5.25), but with the diagonalityconstraint for Ri dropped. To perturb the model,
we let Gi become ^Gj and Ri become ^ilj for i = \,. . .g. The perturbation of
both matrices is important: it preserves the correlations among the random effects
160
as well the correlations among observations in the same cluster. Hence, the only
effect of the correlation is to change the relative weighting of the clusters, allowing
us to assess the influence of clusters of observations.
5.8.2 Computational Building Blocks
To derive A we begin with the perturbed joint likelihood:
K{u,/3;y)  log(/(2/  /3,u)g{u  13))
a  ^ [(y  X*9)' RJiy X*9) + u'Cju]
= \ [y'K'y  2e'x*'Rz'y + 9'x*'Rz'x*e + u'gz'u]
Partial derivatives with respect to the fixed effects /3 are:
^^■^^'^^ = \ [2X'R'^'y + 2X'RJXP + 2X'Rz'Zu\
= X'Rz\yXfiZu)
= Y,u,X[RT\y, Xip  ZiU,).
1=1
Partial derivatives with respect to w are
du
= Z'R\y,  Xi/3  ZiUi)  G'u
9
= J^'^iiZ'.RT^y,  Xi^  ZiUi)  Q.'ui).
«=i
The perturbed estimating functions are then
K{e\y) =
Y:U^iX\R.\y, X,0  Z,Ui)
ELi uJi{Z\R.\y,  XS  ZiUi)  Cr'ui)
(5.41)
161
It follows that the z*'' column of A is
dK{ey)
dcoi
X\RT\y^X,l3ZiU,)
X\R^'ei
(5.42)
Z\Rr'ei  G'ui
This expression can be further simplified. Because the matrices Z, R, and G are all
block diagonal, the BLUPs for the i*'' cluster can be written as
Ui = {Z[Rr^Zi + Gr')'Z[R7'c,
(5.43)
where Cj is the vector of composite residuals for the i*'* cluster. Substituting this
into the bottom portion of (5.42), we find
Z'iR;^ei~Gr^Ui = Z'iR;\ciZiUi)G;^Ui
=^ZrR7^CiiZ[Rr'Zi + Gr')u,
= Z[RT'ci  {Z'.Rr'Zi + Gj'){Z\R'Zi + G'Y' Z[R^ a
= 0.
This implies that only the top pi rows of A are nonzero. The matrix can be written
compactly as
A =
X'R^D{ei)
0„
'flXff
(5.44)
where D(ei) is an n x m blockdiagonal matrix whose i*'* block is ej.
162
The formula has several implications. First, the acceleration matrix for the
influence measure PLD{ui) can be written as
M = 2A'/i A
 2 D{ei)'R^X
h''
/.""
h''
h"
X'R^D{ei)
(5.45)
^2D{ei)'R^X{h}^)X'R^D{ei)
= 2D{ei)'R^X{X'V^X)'X'R^D{ei).
This matrix has at most pi distinct nonzero eigenvalues regardless of the number of
random effects. In addition, using the results given in Section 5.5.3, the acceleration
matrix for influence measure PLD^^^{lo) is equivalent to (5.45). Finally, the
acceleration matrix for PLD^^^u:) is
M
Diei)'R^X
ih'
Kl
)
X'R^D{ei)
• J
(5.46)
= 2D{ei)'R^X{{X'V^X)^  {X'R^X)^)X'R^D{ei).
/f ?
163
5.8.3 Oneway Random Effects Model
Using (5.30), Vmax for directed influence on a linear combination of the
intercept and random effect point inferences is given by
Umax oc A'^ X* a
oc
h X*
,11
12
h'' h"
X
gxg
h'' h
Eni
Eni
Using the relationship between Cj and Uj given in (5.32),
X
12
X
OC
»»
"i^i — '^i
cTg + a^/ui „
(5.47)
(5.48)
This implies that the direction of maximum curvature is proportional to m, the
vector of BLUPs. Thus, we see that the form of A has an interesting consequence
when the intercept is the only fixed effect in the model: no matter what linear
combination is chosen, the direction of maximum curvature is the same. This
anomaly does not occur when ^ has more than element.
164
5.8.4 An Alternative Cluster Variance Perturbation Scheme
In this section, the variance perturbation scheme proposed by Beckman et al.
(1987) is addressed. For the oneway random effects model, the perturbation is
V(ui) = Wjcr^. As each uji approaches infinity, V(«i) — > oo and V{yij) — > oo. Clusters
perturbed to have higher random effects variance are deemed to provide less reliable
information for inference. Also, the correlation between observations in the same
cluster approaches 1 as Wi — > oo:
Thus, observations from clusters that are perturbed to have higher random effects
variance are deemed more alike, while observations from clusters perturbed to have
smaller random effects variance are deemed less alike. This is in contrast to the
perturbation given previously, which preserves the intracluster correlations. Also,
the alternate perturbation does not reweight the clusters' entire contributions to the
likelihood. Since it does not correspond to a cluster weight, it provides diagnostic
information about the assumption of equal cluster variance rather than the influence
of clusters.
• By using Y{ui) = ^a^ rather than Y{ui) — ujiol, we can obtain Umax for
this perturbation from results for the first cluster perturbation. Using the partial
derivatives from Section 5.8.3, we may derive
A =
G^D{u)
165
The acceleration matrix is then
M = 2A'h A
= 2
r
[/." h'']
£)(m)'G^
A" A"
G^£>(m)
22
= 2D{u)'G'h G^D{u)
= 2D(u)'G^ ({Z'R^Z + QY'
(5.49)
+ GZ'V^X{X'V^X)^XV^ZG] G'^D{u
This matrix has at most p2 distinct nonzero eigenvalues regardless of the number of
fixed effects. In addition, using the results given in Section 5.5.3, the acceleration
matrix for influence measure PLZ)["](a;) is equivalent to (5.49). Finally, the
acceleration matrix for PLD^^^ (a;) is
M
i/,i
D{u)'G
■ 1
, • 1
hn
[h 
)
G^D{u)
(5.50)
= 2D{u)'Z'V^X{X'V^X)^XV^ZD{u).
As special case, consider the one way random eff'ects model. Using Theorem
2.1, the direction of maximum curvature for fi is
Umax OC A'/l dy
a
D{u)
a^+a'^/m
a^+CT^/Ug _
(5.51)
(Tl+aVm "i
<T+aVng%
166
Here, the elements corresponding to larger clusters receive more weight. This is in
contrast to the first cluster perturbation scheme, for which Umax oc w.
The direction of maximum curvature for Ui is given by
1
Umax CX: A'/l d2
oc
D{u)
^2
1
+
a^+a^/Ug
(5.52)
Vff2+'^2/„j + „j)M1
2
<72+a2/n2"2
where k was defined after (5.35). The coeflScient for the first element is approximately
1 + g/rii times larger than the others, indicating that influence on the i*'' BLUP is
accomplished largely by perturbing the variance of Ui. The direction of maximum
curvature for jj, + Ui is even more dominated by the first element:
Umax cx: A'/l
1
1
a
n. '
2 a^+a'^/m'^^
1
(7 +02/712
M2
U„
al+c^ln, "S
(5.53)
Here, the first element is approximately g — \ times larger than the other elements.
167
5.9 Application to Corn Crop Data
In this section, the residual variance perturbation and cluster variance
perturbation schemes are applied to an agricultural data set. After briefly reviewing
the data set and goals of the analysis, local influence techniques are applied.
Graphical results indicate that the data set has an influential observation.
5.9.1 Data
Battese et al. (1988) analyzed reported areas of corn crops from 12 Iowa
counties using a variance components model. Keeping the same notation as in
Battese et al. (1988), the model is
Vij
where
— PO + Ui + Pcorn^Uj + Psoy^2ij + ^ijj
xjij = reported hectares of corn in segment j from county i
xi^. = no. of pixels of corn in segment j from county i as per satellite imagery
X2ij = no. of pixels of soy in segment j from county i as per satellite imagery
txi~ i.i.d. N{0,al)
Cij ~ i.i.d. N{0,a^) ■.
1 = 1,... 12
j = l,...ni
{nj = (1,1, 1,2, 3, 3, 3, 3, 4, 5, 5, 6)
^ni=37.
Of particular note in the data set is observation 33. The reported hectares
for this case are exactly the same as those of the previous observation despite
having vastly diflferent covariate values. Battese et al. (1988) felt that the value was
misrecorded, and did not include it in their analysis.
Appendix B contains modelfitting information from an analysis in which the
the observation is included. Note that the regressors are also centered and scaled
for this analysis. Additionally, the variance components have been estimated by
restricted maximum likelihood (REML). By not perturbing the REML estimating
functions, the local influence techniques described in this chapter can be used to
assess the effects of perturbations on ^ and u while effectively treating the variance
components as known.
The goal of the analysis is to first establish a linear relationship between
the satellite data, which is readily available, and the actual reported data, which
is accurate but not available for all segments in all counties. The resulting model
can then be used to make predictions for segments in which reported hectares are
unavailable.
Interest centers on predicting the population mean hectares of corn per
segment. Because the satellite data are available for all segments in each of the 12
counties, the population mean number of pixels for corn and soy in county i is known.
These means are denoted xu^^^ and ^2i(p) and respectively. The corresponding
population mean hectares of corn per segment in county i is then defined as
Point inferences for these values can be made using the the corresponding BLUPs:
/^i = Po + ^i + Pcorn^lii^p) + Psoy^2i(p) • .
5.9.2 Residual Analysis
Figure 5.1 contains diagnostic plots of the corn data. The first plot is of the
37 composite residuals c = y  X^. The three singleobservation counties appear
first, followed by the county with two obervations, etc. Cases from the same county
(cluster) are joined by a line, a practice suggested by HildenMinton (1995). This
169
plot indicates that observation 33 is the most outlying from the overall estimated
regression curve, X^. Figure 5.1(b) shows the residuals e = y  X$  Zu. This
shows that even when taking into account the county effects, observation 33 is
the most outlying. Figure 5.1(c) shows both sets of residuals together, with the
composite residuals represented by stars, and the residuals represented by triangles.
We note that the largest difference among them is for counties 5 and 11, where the
composite residuals are nearly all smaller. Not surprisingly, counties 5 and 11 have
the largest BLUPs in Figure 5.1(d). Also, because the residual for case 33 is large
and a different sign than the other residuals for county 12, it appears that Uu would
be considerably larger if not for observation 33. A local influence analysis can assess
this possibility more thoroughly.
5.9.3 Scheme 1: Residual Variance Perturbation
Figure 5.2 contains local influence diagnostics for perturbing the error
variance in the corn model. The first plot shows the basic curvatures, Cb, that
result from oneatatime perturbations. Observation 33 is considerably more
influential than the others owing to its noticeably large curvature. The element
for case 33 dominates the direction of largest curvature (Figure 5.2(b)). If all
elements were equally contributing, then they would equal either  = ^ = 164 or
—.164. Therefore, the value for element 33 is over five times larger than the value
of equallycontributingelements (e.c.e.). Plots (c), (d), and (e) are the directions of
maximum curvature for directed influence on the intercept, the CORN slope and the
SOY slope respectively. The plots indicate that case 33 is a dominate contributor
to the SOY slope in particular. Plot (f) displays the largest curvatures for directing
influence on the 12 predicted random effects. This plot shows which BLUPs are
most sensitive to perturbation. Here we see that the county 12's BLUP is the most
sensitive to the caseweights. This is the county containing the questionable data
point.
170
20
40
^5 10"; 115 li 20 ;'>ii
^^
Ai
(a) Composite residuals
(b) Residuals
10
7.5
5
2.5
2.5
5
■7.5
10
2 4
t
6 8 10 K
(c) Residujils and composite residu
als
(d) County BLUPs
Figure 5.1: Corn data; diagnostic plots
171
3
\ ♦■.
5 10 15 20 25 30 35
1
0.75
0.5
0.25
•0.25
•0.5
0.75
•1
■ *...« '
(a) Basic curvatures
(b)^„
1
0.75
0.5
0.25
0.25
■0.5
•0.75
1
if
* * f A i ^* ' '■
,i ,'54 f^U ',; iJ ; ?0» * J^ * 30 3^..
1
0.75
0.5
0.25
•0.25
0.5
0.75
■1
i^
Aa '? V ^0 *i 15^* ^0 ^^'i^ ?D
(c) Vmax for /3o
(d) I'max for /Scorn
1
0.7S
O.S
0.25
0.25
0.5
0.75
•1
35
1.5
1.25
1
0.75
0.5
0.25
2 4 6
10 12
(e) Vmax for 0s
(f) C'max's for each of the Uj
Figure 5.2: Corn data; Scheme 1; local influence diagnostics
172
Figure 5.3 shows the directional vectors Vmax for directed influence on the
individual BLUPs. Typically, it would be unnecessary to examine all of these plots,
especially if there are a large number of random effects in the model. Nonetheless,
all plots have been included here to more fully demonstrate their behavior. Upon
inspection, maximum perturbation of Ui is generally achieved by perturbing cases in
cluster i. This is not surprising given the closed form results for the oneway random
effects model in Section 5.8.3. Only in plots (a) and (g) do we see an observation
in another county exerting influence. In both cases, observation 33 is the influential
outofcluster data point.
Figure 5.4 contains local influence diagnostics associated with point inferences
for linear combinations of fixed and random effects. Recall that in Chapter 2,
MaxFITS was proposed as a measure of how sensitive fitted values are to residual
variance perturbation, while BasicFITS was proposed as a measure of how much a
single observation determines its fitted value. These diagnostics are Cmax and C(,
values for M , respectively. Figure 5.4(a) shows MaxFITS (stars) and BasicFITS
(triangles). The values for MaxFITS indicate that the prediction of cases 33
and 34 are noticeably more sensitive to pertubation than the other predictions.
The sensitivity of prediction 33 is largely a function of perturbing this case, as
demonstrated by the fact that its BasicFITS value is nearly as large as its MaxFITS
value. This is confirmed by Umax for directed infiuence on the fitted value of case 33
 element 33 is by far the largest contributor (plot (c)). However, the sensitivity of
case 34's prediction is not largely a functon of perturbing case 34, as demonstrated
by a moderately sized BasicFITS value. The source of the influence is in fact case 33,
as is evident in the plot of Umax for directed influence on the fitted value of case 34,
(plot (d)). In short, the fitted values for observations 33 and 34 are both sensitive
because they are greatly influenced by the model's assumption that observation 33
is just as reliable as the rest of the data. ji j^ _ V •*
# ■■ ' ■ ■ ■.
173
0.15
O.S
'i '^■■1 yi'"'
' . ■*< **« , " ■ t"".i*J * « JU * A ^
1
0.75
0.5
0.15
'0.25
•M
•<.7I
I
• ■ ■ ; ' '"ft"' '' 5 "' 'iy 'i t ' " ^o" '^ i V
(a) Umax for Ml
(b) Umax for U2
(c) Umax for U3
■0.75
•1
•'I ■ ; *"' • ' ■ *' . ,^"V,
■' l i ^' ^io ' / ' B^V^y^ViV
I
0.75
0.25
■■<'■'■
•0.75
1
./^ .
' /"^ ' •'r iy"^ 5' »
1
0.75
0.5
0.15
■0.25
0.5
0.75
» 5" . if ' 15 "ar ^ • 30 » J^
(d) Umax for U4 > ,,' ,;, . ^. (e) Umax for U5
(f) Umax for U6
1
0.75
0.5
0.25
0.25
0.5
0.75
1
1
.' 'i'.'lt
t
t ■ ii
•.»
•.s
0.25
/"'^" •' y.^ ' ij ^
0.25
(.5
•t.n
•1
' ■ "^MyVj ' ^ ' •V^'i ^ 'V.
^5 10 "ft**
) ■ ■ ;» • ./ • ft
(g) Umax for U7
(h) Umax for Us
(i) Umax for Ug
0.75
0.5
0.25
0.25
0.5
0.75
'•'sYV'^ft >;;j5* 30 V^.
1
•.»
O.S
0.25
0.25
I,
I.TS
■1
', »,fti^V..j
.^svV'if •*'*
30 ' ij'f »
1
0.75
0.5
0.25
0.25
0.5
0.75
1
" ■';vV'y'tf ' "ft ' 'Tr ij^
(j) Umax for Wio
(k) Umax for Uu
(1) Umax for U12
Figure 5.3: Corn data; Scheme 1; directed influence on individual BLUPs
174
Of course, the main goal of the analysis is to obtain predictions for the
mean hectares per segment in each of the 12 counties. To determine whether these
predictions are sensitive to Scheme 1, a variation of BasicFITS and MaxFITS is
used. First, the 12 Cmax values for directed influence on county mean predictions
are obtained. These are used to assess the relative sensitivity of the predictions.
Next, the 37 basic curvatures, Cj,, are calculated for individual segments as directed
upon the prediction for the corresponding county. This allows us to examine how
much oneatatime perturbations within a cluster affect the prediction of that
cluster's predicted mean. Finally, the 12 Cmax values and the 37 C(, values are
plotted together (Figure 5.4(b)). Here the Cb values are plotted against case indices,
and the Cmax values are plotted against the case indices of observations in the
corresponding county. (Plot (b) is essentially a BasicFITS/MaxFITS plot when only
one prediction is of interest in each cluster.) The implications for the corn analysis
are twofold. First, the prediction for county 12 is clearly the most sensitive, and
this is due largely to the effects of perturbing case 33. Second, the influence that
case 33 showed on Psoy, ui, and 1x7 (Figures 5.2e, 5.3a, 5.3g) does not translate into
noticeable influence on the overall predicted means in the other 11 counties.
5.9.4 Scheme 2: Clusterbased Variance Perturbation
Having seen the influence of perturbing case 33's variance, we may want to
assess the eff'ects perturbing the variance associated with county 12. To this end, we
implement the clusterbased pertubation scheme suggested in Section 5.8.3. Since
there are only 12 perturbations, we implement the strategy mentioned in Section
2.6  including the benchmarks ±j= = ±.577 and ±7= = ±866 as horizontal
lines. This should prevent us from confusing somewhat large elements with truely
prominent ones. Also, directional vectors are rendered as star plots as well as index
plots. Recall that a star plot is merely an alternative representation that suppresses
the signs of the elements, but does not give the illusion of ordering that index plots
175
u
u
<
■"^ 1 A
^^v
5 ID 15 2D 25 30 35
Case
(a) BasicFITS and MaxFITS
(b) Cmax and Cb for predicted
mean hectares per segment
1
0.75
0.5
0.25
•0.25
■0.5
■0.75
•1
1 1
*
*
> ii 44 ..t A j. ' .A y
3^ i35»
1
0.75
0.5
0.25
■0.25
•0.5
0.75
•1
* ;5* » "l? "so 25 30 ;ii
***i^
i
> fc
(c) Vmax for J/ss
(d) Umax for ^34
Figure 5.4: Corn data; Scheme 1; directed influence on predictions
176
do. Each element is displayed as a point, with the distance from the origin equal to
its absolute value. The plot is completed by labelling the points with indices and
drawing rings to represent the benchmarks ~ and p=.
Figure 5.5 contains index plots and a star plots of the three distinct
eigenvectors resulting from the cluster perturbation. Element 12 of Umax is almost
three times the value of equally contributing elements (plots (a) and (b)). Element
11 of ^2 is nearly two times that of e.c.e. (plots (c) and (d)), while element 8 of U3
is about two times that of e.c.e. (plots (e) and (f)). Nonetheless, the second two
eigenvectors tend to be more noise than signal.
To explore further how cluster 12 is affecting inference. Figure 5.6 gives
directions of maximum curvature for directed influence on each of the individual
BLUEs under the original parameterization. Element 12, together with element 11,
appears prominantly in u^ax for both of the slope estimates. This indicates that the
slopes, rather than the intercept, are being influenced by county 12.
To assess eff'ects of the perturbation on the set of «,, Figure 5.7 gives the
•• [ul
three eigenvectors of M . The direction of largest curvature shows that element
11 is more than two times that of equally contributing elements, while element 12
is roughly three times that of e.c.e. in the second eigenvector. Finally, element 8
is most prominent in the third eigenvector. Because element 12 stands out in V2,
but not Umax, thcsc plots are inconclusive as to whether county 12 is influential as a
whole on the BLUPs.
Finally, we address the main question of interest whether or not the
predicted mean hectares per segment are sensitive to the perturbation. Since there
is only one prediction per county, we can use the perturbation scheme to produce
clusterbased analogs of BasicFITS and MaxFITS (Figure 5.8). The MaxFITS values
show that all of the predicted means show comparable sensitivity to perturbation.
177
Ir
0.75
0.5
o.is
• , • '
2 4 • i *
10 12
ft.35
0.5
0.75
1
(a) v„
lii
(b) IvmaxI Starplot
Ir
0.75
0.5
0.25
2 , *
6 . 8
0.25
0.5
•0.75
•1
10 12
(C)U2
(d) \v2\ starplot
It
0.75
0.5
«.2S
■0.25
■0.5
0.75
1
* 2 4 6 8 S U
(e)t;3
^5*
(f) l^sl starplot
Figure 5.5: Corn data; Scheme 2; local influence on
178
X
0.75
0.5
•
0.25
t
•
•
•
•
2
4 6 8
10 12
•0.25
• •
•
•
0.5
0.75
■1
(a) Wmax for $0
1
0.75
•
0.5
■,., '
t.2S
., , ,
. . • .
•
i
1 6 8
•
V 12
0.25
0.5
0.75
•
1
(b) I Umax I for ySo starplot
(c) Vraax for ^corn
(d) It'maxI for /Scorn Staxpfot
1
0.75
0.5
•
0.25
•
. . • .
2 4 « • 8 ' 10 12
0.25
0.5
•■
0.75
1
(e) Vmax for 4«o»
412 ■■
(f) l^maxl for ^,ov Starplot
Figure 5.6: Corn data; Scheme 2; Umax's for individual BLUEs
179
The small BasicFITS values indicate that none of the counties is selfpredicting: not
even county 12.
5.9.5 Scheme 3: Alternate Clusterbased Variance Perturbation
The alternate cluster variance scheme can also be applied to the data set. The
acceleration matrix of the likelihood displacement has twelve nonzero eigenvalues.
The first two eigenvalues appear in Figure 5.9 as index plots and star plots, and both
vectors implicate elements 5 and 11 as being influential. The other ten eigenvectors
are dominated by a single unique element, and are not shown.
Figure 5.10 shows the directions of maximum curvature for directed influence
on each of the individual BLUEs. Although a couple of elements exceed the 2/y/\2
benchmark, the plots are relatively noisy. The small value for element twelve in all
of the plots suggests that perturbing county 12's random effect variance has little
effect on the fixed effects estimates.
Finally, Figure 5.11 provides the BasicFITS/MaxFITS plot for the alternate
cluster perturbation. The first thing we notice about the plot is that the BasicFITS
values are nearly as large as the MaxFITs values. This suggests that each /ij is most
sensitive to the perturbation of V{ui). We also recall that for the simple random
sample, the i^^ element of Umax for /tj was roughly g —1 times larger than the other
elements. Hence, Figure 5.11 suggests that this dominance may be maintained
in more complicated models. If this is true, it may be sufficient to only plot the
MaxFITS values. For the corn data, the MaxFITS values indicate that the predicted
means for counties with large BLUPs (counties 5 and 11) are most sensitive to the
perturbation of the cluster variance.
5.9.6 Conclusions and Action
Having examined these plots, the analyst should be very wary of observation
33. Scheme I indicates that if its precision varies from that of the other segments
180
Ir
0.75
0.5
0.25
• •
•0.25
0.5
0.75
1
6 . 8
(a) Vr,
10
12
(b) Ivmaxi Starplot
Ir
0.75
0.5
0.25
! f »
10 12
0.25
0.5
0.75
1
(c) V2
y '^
Ir
0.75
0.5
0.35
2 4 6
10 »
0.25
0.5
0.75
1
(e) V3
(d) t;2 staxplot
(f) l^sl starplot
*12.
Figure 5.7: Corn data; Scheme 2; local influence on u
181
Figure 5.8: Corn data; Scheme 2; local influence on predicted meansclusterbased
BasicFITS and MaxFITS
1
0.75
o.s
0.2S
0.25
0.5
0.75
1
» t
6 • 8
ir^
(a) Vr,
(b) I f max I starplot
1
0.75
0.5
0.25
•8.5
0.75
1
■^T
« »
! ? • 10 12
(C) V2
(d) \v2\ starplot
Figure 5.9: Corn data; Scheme 3; local influence on
182
in county 12, the county's predicted mean hectares of corn per segment is
affected (Figure 5.4(b)). Using the weighted county perturbation scheme leads
to inconclusive evidence about county 12's overall influence. Although element
12 figures prominantly in the Vmax's for the two slope estimates (Figure 5.6),
even the 12*'* county's predicted mean appear unaffected (Figure 5.8). Influence
diagnostics for perturbing cluster variances (Figures 5.9, 5.10, and 5.11) suggest
that the assumption of equal variance for the 12"* random effect is not influential.
Therefore, it is deemed that differential weighting of observation 33 is the main
cause of sensitivity, and since the value was considered suspect apriori, deletion or
substantial downweighting of this single observation seems warranted.
 
^'..
■i ■ . 
• '^■.
■ '',>
r^J"^_
'it j^_.
■•< /
• ■ "' r
■if
t
183
1
0.75
0.5
0.25
0.25
0.5
0.75
1
« •
• ! 4
I 10
12
(a) t^max for /?0
(b) Ivmaxl for $0 starplot
1
0.75
0.5
0.25
0.25
■0.5
0.75
1
2 4 6
yi 12
(c) t^max for ^corn
(d) l»max for $corn Staxplot
1
0.75
•
0.5
•
e.2s
* •
0.25
J , 1 . 6 8
«
10 1*2
•
•
0.5
0.75
1
(e) Vmax for /9,oa
(f) It'maxI for 0soy StjlTplot
Figure 5.10: Corn data; Scheme 3; Umax's for individual BLUEs
t V 1. ■
184
0.8
0.6
0.4
0.2
Figure 5.11: Corn data; Scheme 3; influence on predicted means basic curvatures
and maximum curvatures
CHAPTER 6
MULTISTAGE ESTIMATIONS AND FUTURE RESEARCH
In this chapter, several areas of future work are outlined, most of which stem
from the generalization of local assessment to estimating functions in Chapter 4.
First, some preliminary results are given for extending the technique to multistage
estimations. Then, a synopsis of the dissertation is presented along with some
additional areas of future research.
6.1 Local Assessment in Multistage Estimations
Sometimes inference takes place in more than one formal stage. One example
is linear mixed models when variance components are unknown. Restricted/residual
maximum likelihood (REML) (Patterson and Thompson 1971; Robinson 1987;
Searle et al. 1992) can be used to estimate the variance parameters. Then the mixed
model equations are used to estimate /3 and predict m, with the estimated variance
parameters treated as known. Local influence techniques can be used to assess the
effects of perturbations on functions of the two sets of parameter estimates using
results from Chapter 4.
First, let Oi denote the solution of pi estimating equations hi{0i;y) = 0.
Further, let 0^ be the solution of P2 estimating equations h2{02,y,Oi) = for
fixed ^1. Perturbation leads to two sets of perturbed equations, /ii,w(^i;y) =
and h2,ui{02',y,0i,u) = 0, with solutions ^i,,^ and 02,w respectively. The effects
of perturbation are measured by m, which depends on u; explicitly and via 0i^
and 02,ui The algebraic expressions and computational formulas for first and
secondorder influence given in Chapter 4 can be used by simply using the notation
O' = {0[,0'2).
185
186
Because /ii,a;(^i;y) does not depend on 02,^, there are some simplifications
in the computational formulas. For instance, using results on partitioned matrices
(Searle 1982),
h' =
T 1
/ill
1
h
11
7l22 h2lhn /I22
(6.1)
Partitioning A leads to a simplification in J:
d(jj'
h
Ai
A2
(6.2)
/iiiAi
h22 \h2\h11 Ai  A2J
6.2 Synopsis and Additional Future Research
The local influence approach to diagnostics is largely unknown, owing in part
to the lack of introductory resources on the technique. It is hoped that the extended
and detailed discussion in the first two chapters of this thesis have shed some light
on this powerful and practical methodology. The diagnostics derived for perturbing
the entire X matrix in Chapter 3 give previously unavailable detail about how
collinearity and overfitting can aff"ect estimation. The results should encourage
researchers to utilize perturbations other than just caseweights and place more
emphasis on finding closedforms for the resulting diagnostics. Because analysts are
unaccustomed to using directional vectors, having interpretable formulas for special
cases will help them understand and appreciate the results.
187
The foundation of the local influence methodology is a Taylor series
approximation to a highdimensional combined with some familiar geometry
concepts: slopes and accelerations. Using these ideas, the generalization to a class of
influence measures for optimal estimating functions in Chapter 4 is straightforward.
This extention of the methodology provides many avenues for future research;
quasilikelihood and mixed model estimation are only the beginning of the areas of
application.
As for other areas of research, the idea of perturbations is an excellent
concept for linking influence diagnostics, sensitivity analysis, robust inference,
and aspects of optimal design. Robust estimation techniques are those that yield
inferences that are largely insensitive to perturbations. Consequently, minor aspects
of the data, model, and inference technique do not overly influence scientific
conclusions. If a perturbation and influence measure are appropriate for several
methods of inference, the maximum local influence could be used to compare the
techniques' robustness. In fact, mimimizing the maximum local influence could even
serve as a criterion for robustness (minimax li estimation). Optimal sampling and
experimental designs, which are efficient under assumed conditions, should also be
designed so that they are insensitive to perturbation. Local assessment could be
used to assess how sampling and experimental designs can ameliorate or compound
influential phenomenon.
; »
APPENDIX A
MODEL FITTING INFORMATION FOR SCHOOL DATA
NOTE: No intercept in model. Rsqueire is redefined.
Dependent Variable: VSCORE
Analysis of Viuriance
Sum of
Mean
Source
DF
Squares
Square
F Value
Prob>F
Model
5
17.21982
3.44396
29.019
0.0001
Error
15
1.78018
0.11868
U Total
20
19.00000
Root
HSE
34450 Rsquare
9063
Dep
Mean

00000 Adj
Rsq
8751
C.V.
6.7455S8E16
Parameter Estimates
Peurameter
Standard
T for HO:
Variable
DF
Estimate
Error
Parameter=0
Prob > T
SALARY
0.139993
0.09301762
1.505
0.1531
HCOLR
0.194122
0.22907761
0.847
0.4101
SES
0.919607
0.14859782
6.189
0.0001
TSCORE
0.250736
0.09464627
2.649
0.0182
HONED
0.203697
Variimce
0.22031344
0.925
0.3698
Variable
DF
Inflation
SALARY
1.38519879
WCOLR
8.40130895
SES
3.53513942
TSCORE
1.43413046
MOMED
7.77076288
Collinearity Diagnostics
Number
Condition
Var Prop
Var Prop
Var Prop
Var Prop
Var Prop
r
Eigenvalue
Index
SALARY
WCOLR
SES
TSCORE
MOMED
1
2.83681
1.00000
0.0134
0.0129
0.0296
0.0071
0.0142
2
1.39508
1.42598
0.2192
. 0040
0.0020
0.2456
0.0026
3
0.49664
2.38997
0.7601
0.0006
0.0064
0.6485
0.0004
4
0.20253
3.74262
0.0011
0.0655
0.9471
0.0418
0.1258
5
0.06894
6.41490
0.0062
0.9170
0.0149
0.0570
0.8570
188
189
Obs
Dep Var
VSCORE
Predict
Value
Std Err Std Err Student
Predict Residual Residual Residual
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
0.3314
1.4737
0.2454
0.9657
0.3468
0.2033
1 . 1548
0.2892
1.0190
0.3640
2.0255
0.0202
0.0314
0.3408
2.1287
0.7938
0.5643
0.5815
1.3783
1.0190
0.2714
1.4135
0.9244
1.0471
0.2126
0.1885
1.0312
0.2146
0.9116
0.3278
1.6459
0.2800
. 1489
0.2814
1.8229
0.5707
0.3168
1.4413
1 . 1853
0.9738
0.227
0.228
0.099
0.120
0.123
0.231
0.150
0.083
0.167
0.260
0.169
0.205
0.195
0.084
0.188
0.113
0.169
0.181
0.167
0.143
0.0600
0.0602
0.6790
0.0814
. 1342
0.0147
0.1236
0.0747
0.1073
. 0362
0.3796
. 3002
0.1803
0.0594
0.3058
0.2230
0.2475
0.8598
0.1930
. 0452
0.260
0.259
0.330
0.323
0.322
0.256
0.310
0.334
0.301
0.226
0.300
0.277
0.284
0.334
0.289
0.326
0.300
0.293
0.301
0.313
0.231
0.233
2.058
0 . 252
0.417
0.058
0.398
0.223
0.356
0.160
1.265
1.083
0.634
0.178
1.058
0.685
0.825
2.932
0.640
0.144
Obs
210 1 2
Cook's
D Rstudent
Hat Diag
H
Cov
Ratio
Dffits
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
0.008
0.008
0.077
0.002
0.005
0.001
0.007
0.001
0.008
0.007
0.102
0.128
0.038
0.000
0.094
0.011
0.043
0.654
0.025
0.001
0.2236
0.2251
2 . 3475
0.2441
0.4055
0.0557
0.3868
0.2160
0.3457
0.1545
1.2931
1.0900
0.6212
0.1720
1.0629
0.6725
0.8152
4.3370
0.6272
0.1394
0.4325
0.4362
0.0831
0.1212
0.1281
0.4499
0.1891
0.0575
0.2354
0.5680
0.2414
0.3527
0.3193
0.0588
0.2964
0.1069
0.2406
0.2756
. 2346
0.1727
2.4440
2.4596
0.2929
1.5729
1.5276
2.5639
1.6511
1.4734
1.7697
3.2410
1.0588
1.4516
1.8106
1.4844
1.3615
1.3486
1.4745
0.0276
1.6059
1.6948
0.1952
0.1980
0.7067
0.0906
0.1555
0.0504
0.1868
0.0534
0.1918
0.1771
0.7295
0.8046
0.4254
0.0430
0.6899
0.2326
0.4589
2 . 6748
0.3472
0.0637
190
SALARY
WCOLR
SES
TSCORE
HOMED
lbs
Dfbetas
Dfbetas
Dfbetas
Dfbetas
Dfbetas
1
0.1627
0.0882
0.0679
0.0276
0.0339
2
0.0660
0.1443
0.0717
0.0126
. 1444
3
0.0734
0.0199
0.0654
0.2036
0.2487
4
0.0033
0.0428
0.0220
0.0051
0.0563
6
0.0769
0.1122
0.1070
0.0453
. 0394
6
. 0036
0.0155
0.0154
0.0418
0.0105
7
0.0748
0.1413
0.0036
0.0380
0.0973
8
0.0312
0.0014
0.0220
0.0142
0.0250
9
0.0130
0.1498
0.0105
. 1046
0.1325
10
0.1007
0.0145
0.0400
0.1489
0.0279
11
0.2465
0.0798
0.3961
0.1611
0.2056
12
0.0446
0.2661
0.4139
0.2827
0.6298
13
0.1588
0.2764
0.3238
0.0694
0.0699
14
0.0169
0.0145
0.0204
0.0276
0.0236
16
0.0457
0.0462
0.6017
0.1181
0.1775
16
0.1471
. 0342
0.0379
0.0840
0.0415
17
0.4650
. 0429
0.0595
0.2269
0.0573
18
0.2034
1.1197
1.8118
0.0013
1.9401
19
0.1005
. 0353
0.1001
0.0803
0.1385
20
0.0336
0.0341
0.0005
0.0048
0.0165
^,■"
■'^1* s
APPENDIX B
MODEL FITTING INFORMATION FOR CORN CROP DATA
The MIXED Procedure
Class Level Information
Class Levels Values
CNTYNUH 12 1 2 3 4 5 6 7 8 9 10 11 12
REHL Estimation Iteration History
Iteration Evaluations Objective Criterion
1 243.72248590
1 2 242.59476305 0.00000000
Convergence criteria met.
Covariance Parameter Estimates (REML)
Cov Parm Estimate Std Error Z Pr > Z
CNTYNUM 63.28953258 75.07490859 0.84 0.3992
Residual 297.72802207 84.64939011 3.62 0.0004
Asymptotic Covarlimce Matrix of Estimates
Cov Parm Row COVPl C0VP2
CNTYNUM 1 5636.2419002 2123.731245
Residual 2 2123.731245 7148.5993683
Model Fitting Information for HECTCORN
Description * ^ Value
Observations 37
Res Log Likelihood 152.541
Akaike's Information Criterion 154.541
Schwarz's Bayesian Criterion 156.068
2 Res Log Likelihood 305.0826
Solution for Fixed Effects
Effect Estimate
INTERCEPT 120.74028537
PIX.CORN 26.76987997
PII.SOT 2.04732793
Std Error
DF
t
Pr > Itl
3.78206463
11
31.92
0.0001
4.56957580
23
5.64
0.0001
4.56652013
23
0.45
0.6574
191
192
Solution for Random Effects
Effect
CNTYNUM Estimate
SE Pred
DF
t
Pr > Itl
CNXyNUM
1
2.18380161
7 . 35943306
23
0.30
0.7693
CNTYNUM
2
1.47455727
7.30784520
23
0.20
0.8419
CNTYNUM
3
4.72910256
7.26641831
23
0.65
0.5216
CNTYNUM
4
2.76387525
6.88938048
23
0.40
0.6920
CNTYNUM
5
8.36866398
6.42653277
23
1.30
. 2057
CNTYNUM
6
4.27362586
6.52255117
23
0.66
0.5188
CNTYNUM
7
2.70476376
6.50319394
23
0.42
0.6813
CNTYNUM
8
1.15642520
6.47806611
23
0.18
. 8599
CNTYNUM
9
5.02562759
6.21203551
23
0.81
0.4268
CNTYNUM
10
2.88274430
5.92509787
23
0.49
0.6312
CNTYNUM
11
8.65058657
5.88291972
23
1.47
. 1550
CNTYNUM
12
0.75162907
5.69287011
23
0.13
0.8961
Tests of Fixed Effects
Source NDF DDF Type III F Pr > F
PIX.CORM
1
23
31.80 0.0001
PII.SOY
1
23
0.20 0.6574
Predicted Values
193
HECTCORN
Predicted
SE Pred
L95
n95
Residual
165.7600
155.4869
9.7838
135
.2477
175
.7262
10.2731
96.3200
89.3834
8.8869
70
.9994
107
.7674
6.9366
76.0800
98.3267
8.1004
81
.5698
115
.0837
22.2467
185.3500
170.5417
8.5361
152
.8834
188
.2000
14.8083
116.4300
144.2402
7.5387
128
.6451
159
.8353
27.8102
162.0800
154.4196
7.0506
139
.8343
169
.0049
7 . 6604
152.0400
125.5822
6.7333
111,
.6532
139
.5112
26.4578
161.7500
156.5001
7.1277
141,
.7553
171
.2449
5 . 2499
92.8800
91.0834
7.9300
74
.6789
107
.4879
1.7966
149.9400
131.2890
7 . 2587
116
.2733
146,
.3048
18.6510
64 . 7500
65.0934
8.7890
46
.9121
83
.2748
0.3434
127.0700
141.4215
7.1954
126,
.5367
156,
.3062
14.3515
133.5500
118.8645
7.1089
104,
.1587
133,
.5704
14.6855
77.7000
90.7578
7.6708
74
.8896
106
.6260
13.0578
206.3900
184.9299
9.1302
166,
,0426
203,
.8172
21.4601
108.3300
118.7686
6.7585
104,
.7876
132
.7496
10.4386
118.1700
123.7514
7.6313
107,
.9648
139,
.5380
5.5814
99.9600
106.1059
7.3867
90,
.8254
121,
.3864
6.1459
140.4300
123.6154
6.2159
110,
,7567
136,
.4740
16.8146
98.9500
91.7140
8.0456
75,
.0704
108,
.3575
7.2360
131.0400
125.3031
7 . 5450
109,
,6951
140,
.9111
5.7369
114.1200
123.9749
5.9584
111,
,6490
136,
.3009
9.8549
100.6000
97.0015
6.3382
83
.8900
110,
.1130
3.5985
127.8800
139.1748
6.4405
125,
,8516
152,
.4980
11.2948
116.9000
107.4351
5 . 9034
95,
.2230
119,
.6473
9 . 4649
87.4100
92.8847
6.9265
78,
.5562
107,
.2132
5.4747
93 . 4800
85.2028
9.2386
66
.0913
104,
.3142
8 . 2772
121.0000
138.6914
6.9531
124,
.3079
153,
.0749
17.6914
109.9100
127.4057
8.0247
110,
.8053
144,
.0061
17.4957
122.6600
129.0737
6.0804
116,
.4955
141,
.6519
6.4137
104.2100
111.5808
6.1297
98
.9005
124,
.2610
7.3708
88.5900
89.8508
6.3707
76
.6720
103
.0297
1.2608
88.5900
139.1245
8.2468
122,
.0648
156
.1843
50.5345
165.3500
142.4030
6.2021
129,
.5729
155,
.2330
22.9470
104.0000
106.1154
5.7054
94
.3129
117
.9179
2.1154
88.6300
75.2417
8.4331
57
.7964
92
.6869
13.3883
163.7000
139.6604
6.3394
126
.5464
152
.7744
14.0396
REFERENCES
Atkinson, A. C. (1986), "Comments on 'Influential Observations, High Leverage
Points, and Outliers in Linear Regression'," Statistical Science, 1, 397402.
Barlow, W. E. (1997), "Global Measures of Local Influence for Proportional
Hazards Regression Models," Biometrics, 53, 11571162.
Battese, G. E., Barter, R. M., and Fuller, W. A. (1988), "An Errorcomponents
Model for Prediction of County Crop Areas Using Survey and Satellite Data,"
Journal of the American Statistical Association, 83, 2836.
Beckman, R. J., Nachtsheim, C. J., and Cook, R. D. (1987), "Diagnostics for
Mixedmodel Analysis of Variance," Technometrics, 29, 413426.
Belsley, D. A., Kuh, E., and Welsch, R. E. (1980), Regression Diagnostics:
Identifying Influential Data and Sources of Collinearity, New York: Wiley.
Bhapkar, V. P. (1972), "On a Measure of Efficiency of An Estimating Equation,"
Sankhyd, Series A, Indian Journal of Statistics, 34, 467472.
Billor, N. and Loynes, R. M. (1993), "Local Influence: A New Approach,"
Communications in Statistics, Part A  Theory and Methods, 22, 15951611.
Bj0rnstad, J. F. (1990), "Predictive Likelihood: A Review (with discussion),"
Statistical Science, 5, 242265.
Booth, J. G. and Hobert, J. P. (1998), "Standard Errors of Prediction in
Generalized Linear Mixed Models," Journal of the American Statistical
Association, 93, 262272.
Box, G. E. P. (1982), "Choice of Response Surface Design and Alphabetic
Optimality," Utilitas Mathematica, 21, 1155.
Brown, G. C. (1997), "Local Influence Regression Diagnostics for Gamma Data,"
Communications in Statistics, Part A  Theory and Methods, 26, 30513067.
Butler, R. W. (1986), "Predictive Likelihood Inference With Applications (with
discussion)," Journal of the Royal Statistical Society, Series B, 48, 138.
Cadigan, N. G. (1995), "Local Influence in Structural Equation Models," Structural
Equation Modeling, 2, 1330.
Cadigan, N. G. and Farrell, P. J. (1999), "Expected Local Influence in the the
Normal Linear Regression Model," Statistics & Probability Letters, 41, 2530.
194
195
Cook, R. D. (1977), "Detection of Influential Observation in Linear Regression,"
Technometrics, 19, 1518.
Cook, R. D. (1979), "Influential Observations in Linear Regression," Journal of the
American Statistical Association, 74, 169174.
Cook, R. D. (1986), "Assessment of Local Influence (with discussion)," Journal of
the Royal Statistical Society, Series B, 48, 133169.
Cook, R. D. and Weisberg, S. (1991), "Added Variable Plots in Linear Regression,"
In Stahel, W. and Weisberg, S., editors. Directions in Robust Statistics and
Diagnostics. Part I, pages 4760, New York: Springer Verlag.
Cox, D. R. and Reid, N. (1987), "Parameter Orthogonality and Approximate
Conditional Inference (with discussion)," Journal of the Royal Statistical
Society, Series B, 49, 139.
Crowder, M. (1986), "On Consistency and Inconsistency of Estimating Equations,"
Econometric Theory, 2, 305330.
Crowder, M. (1987), "On Linear and Quadratic Estimating Functions,"
Biometrika, 74, 591597. ,
Desmond, A. F. (1997a), "Optimal Estimating Functions, Quasilikelihood and
Statistical Modelling (with discussion)," Journal of Statistical Planning and
Inference, 60, 77121.
Desmond, A. F. (1997b), "Reply to Comments on 'Optimal Estimating Functions,
Quasilikelihood and Statistical Modelling'," Journal of Statistical Planning
and Inference, 60, 116121.
Escobar, L. A. and Meeker, William Q., J. (1992), "Assessing Influence in
Regression Analysis With Censored Data," Biometrics, 48, 507528.
Farebrother, R. W. (1992), "Relative Local Influence and the Condition Number,"
Communications in Statistics, Part B  Simulation and Computation, 21,
707710.
Firth, D. (1987), "On the Efficiency of Quasilikelihood Estimation," Biometrika,
74, 233245.
Fuller, W. A. (1987), Measurement Error Models, New York: Wiley.
Fung, W. K. and Kwan, C. W. (1997), "A Note on Local Influence Based on Normal
Curvature," Journal of the Royal Statistical Society, Series B, 59, 839843.
Fung, W. K. and Tang, M. K. (1997), "Assessment of Local Influence in
Multivariate Regression Analysis," Communications in Statistics, Part A 
Theory and Methods, 26, 821837.
Galea, M., Paula, G. A., and Bolfarine, H. (1997), "Local Influence in Elliptical
Linear Regression Models," The Statistician, 46, 7179.
Geisser, S. (1991), "Diagnostics, Divergences and Perturbation Analysis," In
Stahel, W. and Weisberg, S., editors, Directions in Robust Statistics and
Diagnostics. Part I, pages 89100, New York: Springer Verlag.
Geisser, S. (1993), Predictive Inference. An Introduction, New York: Chapman &
Hall.
Godambe, V. P. (1960), "An Optimum Property of Regular Maximum Likelihood
Estimation," The Annals of Mathematical Statistics, 31, 12081212.
Godambe, V. P. (1976), "Conditional Likelihood and Unconditional Optimum
Estimating Equations," Biometrika, 63, 277284.
Godambe, V. P. (1991), "Orthogonality of Estimating Functions and Nuisance
Parameters," Biometrika, 78, 143151.
Godambe, V. P. and Heyde, C. C. (1987), "Quasilikelihood and Optimal
Estimation," International Statistical Review, 55, 231244.
Godambe, V. P. and Thompson, M. E. (1989), "An Extension of Quasilikelihood
Estimation (with discussion)," Journal of Statistical Planning and Inference,
. 22, 137172.
Henderson, C. R., Kempthorne, O., Searle, S. R., and VonKrosig, C. N. (1959),
"Estimation of Environmental and Genetic Trends from Records Subject to
Culling," Biometrics, 15, 192218.
HildenMinton, J. A. (1995), Multilevel Diagnostics for Mixed and Hierarchical
Linear Models, PhD thesis, The University of California, Los Angeles.
Huynh, H. (1982), "A Comparison of Four Approaches to Robust Regression,"
Psychological Bulletin, 92, 505512.
Johnson, A. R. and Wichern, D. W. (1992), Applied Multivariate Statistical
Analysis (3rd ed.), Englewood Cliffs, NJ: PrenticeHall.
Jung, K.M., Kim, M. G., and Kim, B. C. (1997), "Local Influence in Maximum
Likelihood Factor Analysis," Computational Statistics and Data Analysis, 24,
483491.
Kass, R. E., Tierney, L., and Kadane, J. B. (1989), "Approximate Methods for
Assessing Influence and Sensitivity in Bayesian Analysis," Biometrika, 76,
663674.
Kim, C. (1996a), "Local Influence and Replacement Measure," Communications in
Statistics, Part A  Theory and Methods, 25, 4961.
197
Kim, M. G. (1995), "Local Influence in Multivariate Regression," Communications
in Statistics, Part A  Theory and Methods, 24, 12711278.
Kim, M. G. (1996b), "Local Influence in BoxCox Transformation for Multivariate
Regression Data," Communications in Statistics, Part A  Theory and
Methods, 25, 29552969.
Kim, M. G. (1996c), "Local Influence in Multivariate Normal Data," Journal of
Applied Statistics, 23, 535 541.
Kim, M. G. (1998), "Local Influence on a Test of Linear Hypothesis in Multiple
Regression Model," Journal of Applied Statistics, 25, 145152.
Kim, Y.I. (1991), "Local Influence Approach Diagnostics for Optimal
Experimental Design," Korean Journal of Applied Statistics, 4, 195207.
Lavine, M. (1992), "Local Predictive Influence in Bayesian Linear Models With 
Conjugate Priors," Communications in Statistics, Part B  Simulation and
Computation, 21, 269283.
Lawrance, A. J. (1988), "Regression Transformation Diagnostics Using Local
Influence," Journal of the American Statistical Association, 83, 10671072.
Lawrance, A. J. (1991), "Local and Deletion Influence," In Stahel, W. and
Weisberg, S., editors. Directions in Robust Statistics and Diagnostics. Part I,
pages 141157, New York: Springer Verlag.
Lee, A. H. and Zhao, Y. (1996), "Assessing Local Influence in Measurement Error
Models," Biometrical Journal. Journal of Mathematical Methods in
Biosciences, 38, 829841.
Lee, A. H. and Zhao, Y. (1997), "Assessing Influence on the Goodnessoflink in
Generalized Linear Models," Statistics & Probability Letters, 31, 351358.
Lee, Y. and Nelder, J. A. (1996), "Hierarchical Generalized Linear Models (with
discussion)," Journal of the Royal Statistical Society, Series B, 58, G19'678.
Lejeune, M. and Faulkenberry, G. D. (1982), "A Simple Predictive Density
Function (with discussion)," Journal of the American Statistical Association,
77, 654659.
LesafFre, E., Asefa, M., and Verbeke, G. (1999), "Assessing the GoodnessofFit of
the Laird and Ware Model an Example: The Jimma Infant Survival
Differential Longitudinal Study," Statistics in Medicine, 18, 835854.
LesaflFre, E. and Verbeke, G. (1998), "Local Influence in Linear Mixed Models,"
Biometrics, 54, 570582.
198
Levy, M. S. and Perng, S. K. (1984), "A Maximum Likelihood Prediction Function
for the Linear Model With Consistency Results," Communications in
Statistics, Part A  Theory and Methods, 13, 12571273.
Liang, K.Y. and Zeger, S. L. (1986), "Longitudinal Data Analysis Using
Generalized Linear Models," Biometrika, 73, 1322.
Lindsay, B. (1982), "Conditional Score Functions: Some Optimality Results,"
Biometrika, 69, 503512.
Littell, R. C, Milliken, G. A., Stroup, W. W., and Wolfinger, R. D. (1996), SAS
System for Mixed Models, Gary, NC: SAS Institute.
Loynes, R. M. (1986), "Comments on "Assessment of Local Influence"," Journal of
the Royal Statistical Society, Series B, 48, 156157.
Lu, J., Ko, D., and Chang, T. (1997), "The Standardized Influence Matrix and Its
Applications," Journal of the American Statistical Association, 92, 15721580.
Mathiasen, P. E. (1979), "Prediction Functions," Scandinavian Journal of
Statistics, 6, 121.
McCullagh, P. (1983), "Quasilikelihood Functions," The Annals of Statistics, 11,
5967.
McCullagh, P. and Nelder, J. A. (1989), Generalized Linear Models (2nd Ed.),
London: Chapman & Hall.
McCulloch, C. E. (1997), "Maximum Likelihood Algorithms for Generalized Linear
Mixed Models," Journal of the American Statistical Association, 92, 162170.
McCulloch, R. E. (1989), "Local Model Influence," Journal of the American
Statistical Association, 84, 473478.
Morton, R. (1981), "Efficiency of Estimating Equations and the Use of Pivots,"
Biometrika, 68, 227233.
Pan, J.X., Fang, K.T., and Liski, E. P. (1996), "Bayesian Local Influence for the
Growth Curve Model With Rao's Simple Covariance Structure," Journal of
Multivariate Analysis, 58, 5581.
Pan, J.X., Fang, K.T., and von Rosen, D. (1997), "Local Influence Assessment in
the Growth Curve Model With Unstructured Covariance," Journal of
Statistical Planning and Inference, 62, 263278.
Patterson, H. D. and Thompson, R. (1971), "Recovery of Interblock Information
When Block Sizes Are Unequal," Biometrika, 58, 545554.
Pettitt, A. N. and Daud, I. B. (1989), "Caseweighted Measures of Influence for
Proportional Hazards Regression," Applied Statistics, 38, 5167.
199
Poon, W.Y. and Poon, Y. S. (1999), "Conformal Normal Curvature and
Assessment of Local Influence," Journal of the Royal Statistical Society,
Series 5, 61, 5161.
Prentice, R. L. (1988), "Correlated Binary Regression With Covariates Specific to
Each Binary Observation," Biometrics, 44, 10331048.
Rancel, M. M. S. and Sierra, M. A. G. (1999), "Measures and Procedures for the
Identification of Locally Influential Observations in Linear Regression,"
Communications in Statistics, Part A  Theory and Methods, 28, 343366.
Robinson, D. L. (1987), "Estimation and Use of Variance Components," The
Statistician, 36, 314.
Schall, R. and Dunne, T. T. (1992), "A Note on the Relationship Between
Parameter Collinearity and Local Influence," Biometrika, 79, 399404.
Schall, R. and Gonin, R. (1991), "Diagnostics for Nonlinear Lpnorm Estimation,"
Computational Statistics and Data Analysis, 11, 189198.
Schwarzmann, B. (1991), "A Connection Between Localinfluence Analysis and
Residual Diagnostics," Technometrics, 33, 103104.
Searle, S. R. (1971), Linear Models, New York: Wiley.
Searle, S. R. (1982), Matrix Algebra Useful for Statistics, New York: Wiley.
Searle, S. R., Casella, G., and McCuUoch, C. E. (1992), Variance Components, New
York: Wiley.
Shi, L. (1997), "Local Influence in Principal Components Analysis," Biometrika,
84,175186.
St. Laurent, R. T. and Cook, R. D. (1993), "Leverage, Local Influence and
Curvature in Nonlinear Regression," Biometrika, 80, 99106.
Swokowski, E. W. (1988), Calculus with Analytical Geometry (2nd ed.), Boston:
PWSKent.
Tang, M. K. and Fung, W. K. (1996), "First Order Local Influence of Test
Statistics in Multivariate Regression," Sankhyd, Series B, Indian Journal of
Statistics, 58, 323337.
Tawn, J. (1987), "Comments on 'Parameter Orthogonality and Approximate
Conditional Inference'," Journal of the Royal Statistical Society, Series B, 49,
3334. 
Thomas, W. (1990), "Influence on Confidence Regions for Regression Coefficients in
Generalized Linear Models," Journal of the American Statistical Association,
85, 393397.
200
Thomas, W. (1991), "Influence Diagnostics for the Cross validated Smoothing
Parameter in Spline Smoothing," Journal of the American Statistical
Association, 86, 693698.
Thomas, W. and Cook, R. D. (1989), "Assessing Influence on Regression
Coefficients in Generalized Linear Models," Biometrika, 76, 741749.
Thomas, W. and Cook, R. D. (1990), "Assessing Influence on Predictions From
Generalized Linear Models," Technometrics, 32, 5965.
Tsai, C.L. and Wu, X. (1992), "Assessing Local Influence in Linear Regression
Models With Firstorder Autoregressive Or Heteroscedastic Error Structure,"
Statistics & Probability Letters, 14, 247252.
Vos, P. W. (1991), "A Geometric Approach to Detecting Influential Cases," The
Annals of Statistics, 19, 15701581.
Wang, S.J. and Lee, S.Y. (1996), "Sensitivity Analysis of Structural Equation
Models With Equality Functional Constraints," Computational Statistics and
Data Analysis, 23, 239256.
Wang, X., Ren, S., and Shi, L. (1996), "Local Influence in Discriminant Analysis,"
Journal of Systems Science and Mathematical Science, 8, 2736.
Wedderburn, R. W. M. (1974), "Quasilikelihood Functions, Generalized Linear
Models, and the GaussNewton Method," Biometrika, 61, 439447.
Weiss, R. (1996), "An Approach to Bayesian Sensitivity Analysis," Journal of the
Royal Statistical Society, Series B, 58, 739 750.
Weissfeld, L. A. (1990), "Influence Diagnostics for the Proportional Hazards
Model," Statistics & Probability Letters, 10, 411417.
Weissfeld, L. A. and Schneider, H. (1990), "Influence Diagnostics for the WeibuU
Model Fit to Censored Data," Statistics & Probability Letters, 9, 6773.
Wu, X. and Luo, Z. (1993a), "Residual Sum of Squares and Multiple Potential,
Diagnostics By a Second Order Local Approach," Statistics & Probability
Letters, 16, 289296.
Wu, X. and Luo, Z. (1993b), "Residual Sum of Squares and Multiple Potential,
Diagnostics By a Second Order Local Approach," Statistics & Probability
Letters, 16, 289296.
Wu, X. and Luo, Z. (1993c), "SecondOrder Approach to Local Influence," Journal
of the Royal Statistical Society, Series B, 55, 929936.
Wu, X. and Wan, F. (1994), "A Perturbation Scheme for Nonlinear Models,"
Statistics & Probability Letters, 20, 197202.
aoi
Zeger, S. L. and Liang, K.Y. (1986), "Longitudinal Data Analysis for Discrete and
Continuous Outcomes," Biometrics, 42, 121130.
Zhao, Y. and Lee, A. H. (1995), "Assessment of Influence in Nonlinear
Measurement Error Models," Journal of Applied Statistics, 22, 215225.
BIOGRAPHICAL SKETCH
Glen Hartless was born in Summit, New Jersey in 1971. As he grew up, Glen
always had an interest in "keeping the numbers." While researching careers in
applied mathematics in the tenth grade, he stumbled upon the statistics profession
and knew that it was perfect for him. (Considering the content of his dissertation, it
is ironic that frustration with the abstractness of matrices is what led him to the
field.) Interestingly, while he was still in high school, his father arranged for him to
meet Daryl Pregibon, a pioneer in developing diagnostics for logistic regression.
Perhaps this chance meeting influenced Glen more then he knows. As an
undergraduate at Virgina Tech, he completed a thesis on nonparametric density
estimation. Glen also studied exploratory graphical techniques for analyzing
reliability data while at an NSFsponsored Research Experience for Undergraduates
(REU) program at the College of William & Mary. After receiving his B.S. in
statistics from Virginia Tech in 1994, he attended the University of Florida in
pursuit of his Master of Statistics and Ph.D. degrees. There, he did collaborative
research for four years in the Department of Family, Youth & Community Sciences.
Glen also met and married his wife, Christine, in Gainesville.
202
I certify that I have read this study and that in my opinion it conforms to
acceptable standards of scholarly presentation and is fully adequate, in scope and
quality, as a dissertation for the degree of Doctor of Philosophy.
^C<^'A ^
<^ t^^QTsfXyy
Jam#s G. Booth, Chairman
Professor of Statistics
I certify that I have read this study and that in my opinion it conforms to
acceptable standards of scholarly presentation^^wni is fully adequate, in scope and
quality, as a dissertation for the degree of Dottoj/of Phik
Ramon C. Litteil, Cochairihan
Professor of Statistics
I certify that I have read this study and that in my opinion it conforms to
acceptable standards of scholarly presentation »Jid is^ullj'^^quate, in scope and
quality, as a dissertation for the degree of Doc
J^eS
ssociate Professor of Statistics
I certify that I have read this study and that in my opinion it conforms to
acceptable standards of scholarly presentation and is fully adequate, in scope and
quality, as a dissertation for the degree of Doctor of Philosophy.
Randolph L. Carter
Professor of Statistics
I certify that I have read this study and that in my opinion it conforms to
acceptable standards of scholarly presentation and is fully adequate, in scope and
quality, as a dissertation for the degree of Doctor of Philosophy.
Janies J. Algina
lessor of Educatibi;^l Psychology
This dissertation was submitted to the Graduate Faculty of the Department of
Statistics in the College of Liberal Arts and Sciences and to the Graduate School
and was accepted as partial fulfillment of the requirements for the degree of Doctor
of Philosophy.
August 2000
Dean, Graduate School
LD
1780
20_2£
,^I333
V '•'.,.
.!,: ' «.
UNIVERSITY OF FLORIDA
3 1262 08555 1843